Dawood, Samia, Hicks, Yulia ORCID: https://orcid.org/0000-0002-7179-4587 and Marshall, David 2016. Speech-driven facial animation using manifold relevance determination. Hua, Gang and Jégou, Hervé, eds. Computer Vision – ECCV 2016 Workshops, Vol. 9914. Lecture Notes in Computer Science, Cham: Springer, pp. 869-882. (10.1007/978-3-319-48881-3_57) |
Abstract
In this paper, a new approach to visual speech synthesis using a joint probabilistic model is introduced, namely the Gaussian process latent variable model trimmed with manifold relevance determination model, which explicitly models coarticulation. One talking head dataset is processed (LIPS dataset) by extracting visual and audio features from the sequences. The model can capture the structure of data with extremely high dimensionality. Distinguishable visual features can be inferred directly from the trained model by sampling from the discovered latent points. Statistical evaluation of inferred visual features against ground truth data is obtained and compared with the current state-of-the-art visual speech synthesis approach. The quantitative results demonstrate that the proposed approach outperforms the state-of-the-art technique.
Item Type: | Book Section |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Engineering |
Publisher: | Springer |
ISBN: | 978-3-319-48881-3 |
ISSN: | 1611-3349 |
Last Modified: | 25 Oct 2022 13:40 |
URI: | https://orca.cardiff.ac.uk/id/eprint/120474 |
Actions (repository staff only)
Edit Item |