Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Audio-driven emotion-aware 3D talking face generation from single image

Qiua, Chun-Shuo, Liu, Feng-Lin, Fu, Hongbo, Zhang, Fan, Cao, Yan-Pei, Lai, Yukun ORCID: https://orcid.org/0000-0002-2094-5680 and Gao, Lin 2025. Audio-driven emotion-aware 3D talking face generation from single image. Presented at: IEEE International Conference on Multimedia and Expo, Nantes, France, 30 June - 4 July 2025. 2025 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp. 1-6. 10.1109/ICME59968.2025.11209335

[thumbnail of Audio-Driven Emotion-Aware 3D Talking Face Generation from Single Image-ICME2025.pdf]
Preview
PDF - Presentation
Download (20MB) | Preview

Abstract

Audio-driven talking face generation from a single source image is a popular research topic. There still exist many challenges for its practical applications, e.g., diverse motion generation, effective emotional control, and large view angle changes. In this work, we propose a novel one-shot emotion-controllable audio-driven 3D talking face generation framework, which creates free-view talking videos from one reference image. Firstly, to synchronize the motion with the input audio, we use a transformer-based motion generator to capture the context of the input audio and predict motion coefficient sequences, which are leveraged by a motion encoder to extract motion codes. Meanwhile, to reconstruct a 3D portrait from one reference image, an identity encoder is utilized to extract an identity code and generate emotion-dependent appearance with a specific emotion label. Finally, we introduce an emotion-controllable 3D portrait video generator to synthesize free-view talking videos using the disentangled motion and identity codes. Thanks to the audio-synchronized motion codes and emotion-aware identity code, we can render a talking face with realistic emotional expressions in novel views. Extensive experiments show that our method is capable of maintaining superior visual performance and motion accuracy in both front view and novel views.

Item Type: Conference or Workshop Item (Paper)
Date Type: Published Online
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: IEEE
ISBN: 9798331594961
ISSN: 1945-7871
Date of First Compliant Deposit: 5 July 2025
Date of Acceptance: 20 March 2025
Last Modified: 14 Nov 2025 10:10
URI: https://orca.cardiff.ac.uk/id/eprint/179565

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics