Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

LLM-driven multimodal and multi-identity listening head generation

Lai, Peiwen, Zhong, Weizhi, Qin, Yipeng ORCID: https://orcid.org/0000-0002-1551-9126, Ren, Xiaohang, Wang, Baoyuan and Li, Guanbin 2025. LLM-driven multimodal and multi-identity listening head generation. Presented at: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025, Nashville, USA, 11 - 15 June 2025. Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, pp. 10656-10666. 10.1109/CVPR52734.2025.00996

[thumbnail of Listener_Generation_CVPR2025.pdf]
Preview
PDF - Accepted Post-Print Version
Download (5MB) | Preview

Abstract

Generating natural listener responses in conversational scenarios is crucial for creating engaging digital humans and avatars. Recent work has shown that large language models (LLMs) can be effectively leveraged for this task, demonstrating remarkable capabilities in generating contextually appropriate listener behaviors. However, current LLM-based methods face two critical limitations: they rely solely on speech content, overlooking other crucial communication signals, and they entangle listener identity with response generation, compromising output fidelity and generalization. In this work, we present a novel framework that addresses these limitations while maintaining the advantages of LLMs. Our approach introduces a Multimodal-LM architecture that jointly processes speech content, acoustics, and speaker emotion, capturing the full spectrum of communication cues. Additionally, we propose an identity disentanglement strategy using instance normalization and adaptive instance normalization in a VQ-VAE framework, enabling high-fidelity listening head synthesis with flexible identity control. Extensive experiments demonstrate that our method significantly outperforms existing approaches in terms of response naturalness and fidelity, while enabling effective identity control without retraining.

Item Type: Conference or Workshop Item (Paper)
Date Type: Published Online
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: IEEE
ISBN: 9798331543655
Date of First Compliant Deposit: 28 March 2025
Date of Acceptance: 26 February 2025
Last Modified: 28 Aug 2025 11:45
URI: https://orca.cardiff.ac.uk/id/eprint/177053

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics