The identification of the parameters of the vocal tract system can be used for speaker identification.
A preferred speech coding technique is the so-called Model-Based Speech Coding (MBSC), which involves modeling the vocal tract as a linear time-variant system (synthesis filter). The system's input is either white noise or a train of impulses. For coding purposes, the synthesis filter is assumed to be time-invariant during a short time interval (time slot) of typically 10-20 msec. Then, the signal is represented by the coefficients of the synthesis filter corresponding to each time slot.
A successful MBSC method is the so-called Linear Prediction Coding (LPC). Roughly speaking, the LPC technique models the synthesis filter as an all-pole linear system. This all-pole linear system has coefficients obtained by adapting a predictor of the output signal, based on its own previous samples. The use of an all-pole model provides a good representation for the majority of speech sounds. However, the representation of nasal sounds, fricative sounds, and stop consonants requires the use of a zero-pole model. Also, the LPC technique is not adequate when the voice signal is corrupted by noise.
We propose a method to estimate a zero-pole model which is able to provide the optimal synthesis filter coefficients, numerically efficient and optimal when minimizing a logarithm criterion.
In order to evaluate the perceptual relevance of the proposed method, we used the model estimated from a speech signal to re-synthesis it:
- D. Marelli, P. Balazs, "On Pole-Zero Model Estimation Methods Minimizing a Logarithmic Criterion for Speech Analysis", IEEE Transactions on Audio, Speech and Language Processing, Vol. 18 (2), pp. 237 - 248 (2010)