KLATTSYN

From STX Wiki
Jump to navigationJump to search

KLATTSYN - Klatt parameter synthesis

Usage:

KLATTSYN SR N SYNMODE OUTMODE TCONFIG TFRAME

Inputs:

The KLATTSYN expects the following six input parameters:

SR
The sampling rate
N
The frame length (i.e. the number of samples per evaluation step).
SYNMODE
The synthesis mode specifying whether the parameter set given in the TFRAME input should be used in a loop to produce a stationary synthesis ("loop") or if the parameter set is predefined for each evaluation step ("list").
OUTMODE
The output mode. The following mode constants are supported:
"all" for full synthesis output
"voice" for output of only the voice source
"aspiration" for output of only the aspiration source
"frics" for output of the frication output
"glotout" for output of voicing and aspiration
"par_glotout" for output of voicing and aspiration in parallel tract
"outbypas" for output only of bypass path
"sourc" for source output
TCONFIG
Global configuration parameter table. See [link to global parameter description] for details.
TFRAME
Frame configuration parameter table. See [link to frame parameter description] for details.
Outputs:

The atom has the following outputs:

Y
The generated output signal.
I
The number of synthesis frames.
T
The length of the synthesized signal.
Function:

The KLATTSYN SPAtom encapsulates the synthesizer functionality provided by the Klatt C++ class and can be used within SPUs in the STx macro language.

The Klatt parameter synthesis uses a set of parameters to generate synthesized speech. It was developed by Dennis H. Klatt and was described in Klatt, D.H. (1980), "Software for a cascade/parallel formant synthesizer", Journal of the Acoustical Society of America 67 (3), 971-995. The set of parameters used by this implementation was based on a refined set described in Klatt, D.H. and Klatt, L.C. (1990), "Analysis, synthesis, and perception of voice quality variations among female and male talkers", Journal of the Acoustical Society of America 87 (2), 820-857.

The implementation consists of the native C++ part, including a class that implements the actual synthesizer and a SPAtom class that encapsulates the synthesizer functionality and provides an interface for the STx macro language. The second part consists of two STx classes, one for interfacing with the C++ part and one providing a set of toolbox functions for several contexts within STx.

Notes:

The frame parameter table (TFRAME) contains a row for each frame (or only one row for synthesis in loop mode). The following column names are evaluated:

Field Description
F0 Voicing fundamental frequency in Hz
AV Amplitude of voicing in dB (0 to 70)
F1 First formant frequency in Hz (200 to 1300)
B1 First formant bandwidth in Hz (40 to 1000)
F2 Second formant frequency in Hz (550 to 3000)
B2 Second formant bandwidth in Hz (40 to 1000)
F3 Third formant frequency in Hz (1200 to 4999)
B3 Third formant bandwidth in Hz (40 to 1000)
F4 Fourth formant frequency in Hz (1200 to 4999)
B4 Fourth formant bandwidth in Hz (40 to 1000)
F5 Fifth formant frequency in Hz (1200 to 4999)
B5 Fifth formant bandwidth in Hz (40 to 1000)
F6 Sixth formant frequency in Hz (1200 to 4999)
B6 Sixth formant bandwidth in Hz (40 to 2000)
FNZ Nasal zero frequency in Hz (248 to 528)
BNZ Nasal zero bandwidth in Hz (40 to 1000)
FNP Nasal pole frequency in Hz (248 to 528)
BNP Nasal pole bandwidth in Hz (40 to 1000)
ASP Amplitude of aspiration in dB (0 to 70)
Kopen Number of samples in open period (10 to 65)
Aturb Breathiness in voicing (0 to 80)
TLT Voicing spectral tilt in dB (0 to 24)
AF Amplitude of frication in dB (0 to 80)
Kskew Skewness of alternate periods (0 to 40 in sample#/2)
A1 Amplitude of par 1st formant in dB (0 to 80)
B1p Par. 1st formant bandwidth in Hz (40 to 1000)
A2 Amplitude of F2 frication in dB (0 to 80)
B2p Par. 2nd formant bandwidth in Hz (40 to 1000)
A3 Amplitude of F3 frication in dB (0 to 80)
B3p Par. 3rd formant bandwidth in Hz (40 to 1000)
A4 Amplitude of F4 frication in dB (0 to 80)
B4p Par. 4th formant bandwidth in Hz (40 to 1000)
A5 Amplitude of F5 frication in dB (0 to 80)
B5p Par. 5th formant bandwidth in Hz (40 to 1000)
A6 Amplitude of F6 (same as r6pa) (0 to 80)
B6p Par. 6th formant bandwidth in Hz (40 to 2000)
ANP Amplitude of par nasal pole in dB (0 to 80)
AB Amplitude of bypass fric. in dB (0 to 80)
AVp Amplitude of voicing ( par in dB (0 to 70)
Gain0 Overall gain (60 dB is unity) (0 to 60)

The global configuration table contains three fields, named "ID", "NumVal" and "StrVal". Depending on the type of parameter, the value must be specified in the "NumVal" field or the "StrVal" field.

ID NumVal StrVal Description
synthesismodel cascadeparallel

allparallel

Specifies if the synthesizer should use the cascade tract for formant synthesis or only the parallel tract
nfcascade 0-6 If the cascade tract is used, this parameter specifies the number of formants to be used
glsource impulsive

natural sampled

Type of glottal source (impulsive, natural or sampled)
samplefactor 0.00001 Multiplication factor for glottal samples (default = 0.00001)
f0flutter 0-100 Percentage of f0 flutter (0-100, default = 0)

The KLATTSYN atom was added to STx in version 3.9.

Navigation menu

Personal tools