ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Audio-driven talking face video generation with dynamic convolution kernels

Ye, Zipeng, Xia, Mengfei, Yi, Ran, Zhang, Juyong, Lai, Yu-Kun

, Huang, Xuwei, Zhang, Guoxin and Liu, Yong-jin 2023. Audio-driven talking face video generation with dynamic convolution kernels. IEEE Transactions on Multimedia 25 , pp. 2033-2046. 10.1109/TMM.2022.3142387

Preview

PDF - Accepted Post-Print Version
Download (6MB) | Preview

Official URL: http://dx.doi.org/10.1109/TMM.2022.3142387

Abstract

In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural networks. Using a fully convolutional network with the proposed DCKs, high-quality talking-face video can be generated from multi-modal sources (i.e., unmatched audio and video) in real time, and our trained model is robust to different identities, head postures, and input audios. Our proposed DCKs are specially designed for audio-driven talking face video generation, leading to a simple yet effective end-to-end system. We also provide a theoretical analysis to interpret why DCKs work. Experimental results show that our method can generate high-quality talking-face video with background at 60 fps. Comparison and evaluation between our method and the state-of-the-art methods demonstrate the superiority of our method.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics
Publisher:	Institute of Electrical and Electronics Engineers
ISSN:	1520-9210
Date of First Compliant Deposit:	23 January 2022
Date of Acceptance:	8 January 2022
Last Modified:	24 Nov 2024 08:30
URI:	https://orca.cardiff.ac.uk/id/eprint/146847

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)