Bio-inspired evolutionary oral tract shape modelling for physical modelling vocal synthesis

Howard, David, Tyrell, Andy, Murphy, Damian and Cooper, Crispin

2009. Bio-inspired evolutionary oral tract shape modelling for physical modelling vocal synthesis. Journal of Voice 23 (1) , pp. 11-20. 10.1016/j.jvoice.2007.03.003

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1016/j.jvoice.2007.03.003

Abstract

Physical modeling using digital waveguide mesh (DWM) models is an audio synthesis method that has been shown to produce an acoustic output in music synthesis applications that is often described as being “organic,” “warm,” or “intimate.” This paper describes work that takes its inspiration from physical modeling music synthesis and applies it to speech synthesis through a physical modeling mesh model of the human oral tract. Oral tract shapes are found using a computational technique based on the principles of biological evolution. Essential to successful speech synthesis using this method is accurate measurements of the cross-sectional area of the human oral tract, and these are usually derived from magnetic resonance imaging (MRI). However, such images are nonideal, because of the lengthy exposure time (relative to the time of articulation of speech sounds) required, the local ambient acoustic noise associated with the MRI machine itself and the required supine position for the subject. An alternative method is described where a bio-inspired computing technique that simulates the process of evolution is used to evolve oral tract shapes. This technique is able to produce appropriate oral tract shapes for open vowels using acoustic and excitation data from two adult males and two adult females, but shapes for close vowels that are less appropriate. This technique has none of the drawbacks associated with MRI, because all it requires from the subject is an acoustic and electrolaryngograph (or electroglottograph) recording. Appropriate oral tract shapes do enable the model to produce excellent quality synthetic speech for vowel sounds, and sounds that involve dynamic oral tract shape changes, such as diphthongs, can also be synthesized using an impedance mapped technique. Efforts to improve performance by reducing mesh quantization for close vowels had little effect, and further work is required.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Research Institutes & Centres > Sustainable Places Research Institute (PLACES) Schools > Geography and Planning (GEOPL)
Subjects:	Q Science > Q Science (General) Q Science > QH Natural history > QH426 Genetics
Uncontrolled Keywords:	Speech synthesis; Physical modeling; Bio-inspired computing; Evolution; Naturalness; Electrolaryngography; Electroglottography
Publisher:	Elsevier
ISSN:	0892-1997
Last Modified:	24 Oct 2022 11:21
URI:	https://orca.cardiff.ac.uk/id/eprint/47840

Citation Data

Cited 3 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

CORE (COnnecting REpositories)