Publications – Laboratory for Machine Intelligence

2015

Justin, Tadej; Štruc, Vitomir; Dobrišek, Simon; Vesnicer, Boštjan; Ipšić, Ivo; Mihelič, France

Speaker de-identification using diphone recognition and speech synthesis Proceedings Article

In: 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG): DeID 2015, pp. 1–7, IEEE 2015.

Abstract | Links | BibTeX | Tags: DEID, FG, speech deidentification, speech recognition, speech synthesis, speech technologies

@inproceedings{justin2015speaker,

title = {Speaker de-identification using diphone recognition and speech synthesis},

author = {Tadej Justin and Vitomir Štruc and Simon Dobrišek and Boštjan Vesnicer and Ivo Ipšić and France Mihelič},

url = {https://lmi.fe.uni-lj.si/en/speakerde-identificationusingdiphonerecognitionandspeechsynthesis/},

year  = {2015},

date = {2015-01-01},

urldate = {2015-01-01},

booktitle = {11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG): DeID 2015},

volume = {4},

pages = {1--7},

organization = {IEEE},

abstract = {The paper addresses the problem of speaker (or voice) de-identification by presenting a novel approach for concealing the identity of speakers in their speech. The proposed technique first recognizes the input speech with a diphone recognition system and then transforms the obtained phonetic transcription into the speech of another speaker with a speech synthesis system. Due to the fact that a Diphone RecOgnition step and a sPeech SYnthesis step are used during the deidentification, we refer to the developed technique as DROPSY. With this approach the acoustical models of the recognition and synthesis modules are completely independent from each other, which ensures the highest level of input speaker deidentification. The proposed DROPSY-based de-identification approach is language dependent, text independent and capable of running in real-time due to the relatively simple computing methods used. When designing speaker de-identification technology two requirements are typically imposed on the deidentification techniques: i) it should not be possible to establish the identity of the speakers based on the de-identified speech, and ii) the processed speech should still sound natural and be intelligible. This paper, therefore, implements the proposed DROPSY-based approach with two different speech synthesis techniques (i.e, with the HMM-based and the diphone TDPSOLA- based technique). The obtained de-identified speech is evaluated for intelligibility and evaluated in speaker verification experiments with a state-of-the-art (i-vector/PLDA) speaker recognition system. The comparison of both speech synthesis modules integrated in the proposed method reveals that both can efficiently de-identify the input speakers while still producing intelligible speech.},

keywords = {DEID, FG, speech deidentification, speech recognition, speech synthesis, speech technologies},

pubstate = {published},

tppubtype = {inproceedings}

}

Justin, Tadej; Štruc, Vitomir; Žibert, Janez; Mihelič, France

Development and Evaluation of the Emotional Slovenian Speech Database-EmoLUKS Proceedings Article

In: Proceedings of the International Conference on Text, Speech, and Dialogue (TSD), pp. 351–359, Springer 2015.

Abstract | Links | BibTeX | Tags: annotated data, dataset, dataset of emotional speech, EmoLUKS, emotional speech synthesis, speech synthesis, speech technologies, transcriptions