Label Assisted Copy Synthesis
The automatic generation of control signals to drive a formant synthesizer offers an excellent method of validating phonological models by observing their phonetic output. This is made all the more challenging by the high quality of the speech which a formant synthesizer such as Klatt's (1980) model can produce when provided with appropriate control signals.
Copy synthesis of natural utterances is undoubtedly one of the most interesting and enlightening methods of arriving at these numbers. However, two serious problems arise when mapping the results of an acoustic analysis onto the control parameters of the Klatt formant synthesizer:
LACS is a knowledge-based solution to the problems outlined above. The mapping of acoustic analysis onto synthesizer control parameters is carried out using information from annotations of the utterances being synthesized. At any point in the mapping process a decision can be made using the linguistic information provided by time-aligned labels. Using a large labelled corpus such as The Kiel Corpus allows copy synthesis of a number of different female and male voices carrying out different linguistic tasks.
Modelling glottal activity is one of the ways in which label information can be successfully used to fully exploit the parameters which the Klatt synthesizer provides. The diagrams below illustrate how the different correlates of h can be modelled. In either case it is only the combination of label and analytical information that the we can control the source parameters for voicing and aspiration and decide whether to use the formant information to excite the cascade or parallel branch of the synthesizer.
Here are some examples for the ear, comparing the original utterances with their copy-synthetic counterparts. The first illustrates the `reconstruction' of creak at the onset of ein when the F0 analysis has returned voicelessness. In the second example, note in particular the voiced alveolar friction in the word Konserven. This portion of signal, leaving the F0 analysis voiced, would otherwise have been synthesized as something akin to [ð].