Howard, David, Tyrell, Andy, Murphy, Damian and Cooper, Crispin ORCID: https://orcid.org/0000-0002-6371-3388 2009. Bio-inspired evolutionary oral tract shape modelling for physical modelling vocal synthesis. Journal of Voice 23 (1) , pp. 11-20. 10.1016/j.jvoice.2007.03.003 |
Abstract
Physical modeling using digital waveguide mesh (DWM) models is an audio synthesis method that has been shown to produce an acoustic output in music synthesis applications that is often described as being “organic,” “warm,” or “intimate.” This paper describes work that takes its inspiration from physical modeling music synthesis and applies it to speech synthesis through a physical modeling mesh model of the human oral tract. Oral tract shapes are found using a computational technique based on the principles of biological evolution. Essential to successful speech synthesis using this method is accurate measurements of the cross-sectional area of the human oral tract, and these are usually derived from magnetic resonance imaging (MRI). However, such images are nonideal, because of the lengthy exposure time (relative to the time of articulation of speech sounds) required, the local ambient acoustic noise associated with the MRI machine itself and the required supine position for the subject. An alternative method is described where a bio-inspired computing technique that simulates the process of evolution is used to evolve oral tract shapes. This technique is able to produce appropriate oral tract shapes for open vowels using acoustic and excitation data from two adult males and two adult females, but shapes for close vowels that are less appropriate. This technique has none of the drawbacks associated with MRI, because all it requires from the subject is an acoustic and electrolaryngograph (or electroglottograph) recording. Appropriate oral tract shapes do enable the model to produce excellent quality synthetic speech for vowel sounds, and sounds that involve dynamic oral tract shape changes, such as diphthongs, can also be synthesized using an impedance mapped technique. Efforts to improve performance by reducing mesh quantization for close vowels had little effect, and further work is required.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Sustainable Places Research Institute (PLACES) Geography and Planning (GEOPL) |
Subjects: | Q Science > Q Science (General) Q Science > QH Natural history > QH426 Genetics |
Uncontrolled Keywords: | Speech synthesis; Physical modeling; Bio-inspired computing; Evolution; Naturalness; Electrolaryngography; Electroglottography |
Publisher: | Elsevier |
ISSN: | 0892-1997 |
Last Modified: | 24 Oct 2022 11:21 |
URI: | https://orca.cardiff.ac.uk/id/eprint/47840 |
Citation Data
Cited 3 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |