Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Modelling talking human faces

Albasri, Samia 2019. Modelling talking human faces. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2019AlbasriSDSPhD.pdf]
PDF - Accepted Post-Print Version
Download (2MB) | Preview
[thumbnail of Cardiff University Electronic Theses Publication Form] PDF (Cardiff University Electronic Theses Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (170kB)


This thesis investigates a number of new approaches for visual speech synthesis using data-driven methods to implement a talking face. The main contributions in this thesis are the following. The accuracy of shared Gaussian process latent variable model (SGPLVM) built using the active appearance model (AAM) and relative spectral transform-perceptual linear prediction (RASTAPLP) features is improved by employing a more accurate AAM. This is the first study to report that using a more accurate AAM improves the accuracy of SGPLVM. Objective evaluation via reconstruction error is performed to compare the proposed approach against previously existing methods. In addition, it is shown experimentally that the accuracy of AAM can be improved by using a larger number of landmarks and/or larger number of samples in the training data. The second research contribution is a new method for visual speech synthesis utilising a fully Bayesian method namely the manifold relevance determination (MRD) for modelling dynamical systems through probabilistic non-linear dimensionality reduction. This is the first time MRD was used in the context of generating talking faces from the input speech signal. The expressive power of this model is in the ability to consider non-linear mappings between audio and visual features within a Bayesian approach. An efficient latent space has been learnt iii Abstract iv using a fully Bayesian latent representation relying on conditional nonlinear independence framework. In the SGPLVM the structure of the latent space cannot be automatically estimated because of using a maximum likelihood formulation. In contrast to SGPLVM the Bayesian approaches allow the automatic determination of the dimensionality of the latent spaces. The proposed method compares favourably against several other state-of-the-art methods for visual speech generation, which is shown in quantitative and qualitative evaluation on two different datasets. Finally, the possibility of incremental learning of AAM for inclusion in the proposed MRD approach for visual speech generation is investigated. The quantitative results demonstrate that using MRD in conjunction with incremental AAMs produces only slightly less accurate results than using batch methods. These results support a way of training this kind of models on computers with limited resources, for example in mobile computing. Overall, this thesis proposes several improvements to the current state-of-the-art in generating talking faces from speech signal leading to perceptually more convincing results.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Engineering
Uncontrolled Keywords: Talking faces; Visual speech; Facial animation; Manifold relevance determination (MRD); Audio visual mapping; Shared Gausian process latent variable model (SGPLVM).
Funders: Ministry of Higher Education and Scienti�c Research, Iraq
Date of First Compliant Deposit: 8 August 2019
Last Modified: 04 Apr 2020 01:28

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics