Effects of the subthalamic stimulation on the characteristic of speech by parkinson patients.
The generation of speaker models is based on acoustic features obtained from speech corpora. From a closed set of speakers, the target speaker has to be identified in an unsupervised identification task.
Training and comparison recordings exist for every speaker. The training set is used to generate parametric speaker models (Gaussian Mixture Models – GMMs), while the test set is needed for the comparisons of all test recordings to all models. The model with the highest similarity (maximum likelihood) is chosen as the target speaker. The efficiency of the identification task is measured as the identification rate (i.e. the number of correctly chosen target speakers).
Aside from biometric commercial applications, the forensic domain is another important field where speaker identification is used. Because speaker identification is a closed-set classification task, it is useful in cases where a target speaker has to be selected from a set of known speakers (e.g. in the case of hidden observations).
The aim of this project is to conduct basic research on the audio-visual speech synthesis of Austrian dialects. The project extends our previous work on
10 speakers (5 male and 5 female) will be recorded for each dialect. The recordings comprise spontaneous speech, read speech and naming tasks, eliciting substantial phonemic distinctions and phonotactics. Consequently, a detailed acoustic-phonetic and phonological analysis will be performed for each dialect. Based on the acoustic-phonetic and phonological data analysis, 600 phonetically balanced sentences will be created and recorded with 4 speakers (2 male, 2 female) for each dialect. In these recordings the acoustic and the visual signal, resulting from the same speech production process, will be recorded jointly to account for the multimodal nature of human speech. The recorded material will serve as a basis for the development, training, and testing of speech synthesizers at the Telecommunications Research Center.
FWF (Wissenschaftsfonds): 2011-2013
Project Manager: Michael Pucher, Telecommunications Research Center, Vienna
Project Partner: Sylvia Moosmüller, Acoustics Research Institute, Austria Academy of Sciences, Vienna
The offender recording (a speaker recorded at the scene of a crime) is verified by determining the similarity of the offender recording's typicality to a recording of a suspect.
A universal background model (UBM) is generated via the training of a parametric Gaussian Mixture Model (GMM) that reflects the distribution of feature vectors in a reference population. Every comparison recording is used to derive a GMM from the UBM by adaptation of the model parameters. Similarity is measured through computation of the likelihoods of the offender recordings in the GMM while typicality is measured by computation of the likelihoods of the offender recordings in the UBM. The verification is expressed as the likelihood ratio of these likelihood values.
While fully unsupervised automatic verification is performed with a binary decision using a likelihood ratio threshold and is used in biometric commercial applications, the usage of the likelihood ratio as an expression of the strength of the evidence in forensic speaker verification has become an important issue.