MORPHOLOGICAL AND SYNTACTIC FACTORS IN PREDICTING SEGMENTAL DURATIONS FOR ESTONIAN TEXT-TO-SPEECH SYNTHESIS

Meelis Mihkla
Institute of Estonian Language

ID 1414
[full paper]

Traditionally, durational models of speech units have been developed without paying much heed to morphology and part-of-speech information while predicting speech temporal structure. The aim of the present study was to find out whether the rich morphology of the Estonian language could possibly provide some additional (beside the syntactic and part-of-speech) information that could be used in predicting durations. The project is a continuation of prosody studies for Estonian TTS synthesis. Sound durations in the speech of radio newsreaders were modelled by means of different statistical methods (linear regression and neural networks). Model input consisted not only of descriptors of sound context and position, but also of information on part of speech, part of sentence and morphological features. The results indicated a decrease of error in the prediction of segmental durations. Such results were in good harmony with our expectations concerning a morphologically rich language.