Springer Series in Information Sciences - 3: Pitch Determination of Speech Signals
Algorithms and Devices
- 700 pages
- 25 hours of reading
Pitch, encompassing fundamental frequency (FO) and fundamental period (TO), is crucial in the acoustic speech signal, primarily determining prosodic information. The ear is significantly more sensitive to changes in fundamental frequency than to other speech parameters. Consequently, the quality of vocoded speech relies heavily on accurate pitch measurement, underscoring the need for reliable methods. Although it may seem straightforward to detect the fundamental frequency or period of a quasi-periodic signal, pitch determination is one of the most challenging problems in speech analysis for several reasons. Firstly, speech is inherently nonstationary; the vocal tract's position can change abruptly, causing significant variations in the signal's temporal structure, making the assumption of a quasi-periodic signal often unrealistic. Secondly, the flexibility of the human vocal tract and the diversity of voices contribute to a wide range of possible temporal structures. Additionally, narrow-band formants at low harmonics, particularly at the second or third harmonic, complicate the analysis further. Lastly, for any given speech signal from an unknown speaker, the fundamental frequency can fluctuate across a range of nearly four octaves (50 to 800 Hz), adding to the complexity of accurate pitch determination.

