Yang, Jie, Zhang, Bo-Tao, Liu, Feng-Lin, Fu, Hongbo, Lai, YuKun ORCID: https://orcid.org/0000-0002-2094-5680 and Gao, Lin
2025.
Single-Image 3D human reconstruction with 3D-aware diffusion priors and facial enhancement.
Presented at: SIGGRAPH Asia 2025 Conference,
Hong Kong,
15-18 December 2025.
Published in: Komura, Taku, Wimmer and Fu, Hongbo eds.
Proceedings of the SIGGRAPH Asia 2025 Conference.
New York, NY:
ACM,
10.1145/3757377.3763839
|
|
PDF
- Published Version
Available under License Creative Commons Attribution Non-commercial. Download (19MB) |
Abstract
Creating high-quality, photorealistic 3D digital humans from a single image remains challenging. While existing methods can generate visually appealing multi-view outputs, they often suffer from inconsistencies in viewpoints and camera poses, resulting in suboptimal 3D reconstructions with reduced realism. Furthermore, most approaches focus on body generation while overlooking facial consistency – a perceptually critical issue caused by the fact that the face occupies only a small area in a full-body image (e.g., ∼ 80 × 80 pixels out of a 512 × 512 image). This limited resolution and low weight for the facial regions during optimization leads to insufficient facial details and inconsistent facial identity features across multiple views.To address these challenges, we leverage the powerful capabilities of 2D video diffusion models for consistent multi-view RGB and Normal human image generation, combined with the 3D SMPL-X representation to enable spatial consistency and geometrical details. By fine-tuning the DiT models (HumanWan-DiTs) on realistic 3D human datasets using the LoRA technique, our method ensuresboth generalizability and 3D visual consistency on realistic multi-view human image generation. The proposed facial enhancement is integrated into 3D Gaussian optimization to enhance facial details. To further refine results, we apply super-resolution and generative priors to reduce facial blurring alongside SMPL-X parameter tuning and the assistance of generated multi-view normal images, achieving photorealistic and consistent rendering from a single image. Extensive experiments demonstrate that our approach outperforms existing methods, producing photorealistic, consistent, and fine-detailed human renderings.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Date Type: | Published Online |
| Status: | Published |
| Schools: | Schools > Computer Science & Informatics |
| Publisher: | ACM |
| ISBN: | 9798400721373 |
| Funders: | EPSRC |
| Date of First Compliant Deposit: | 14 December 2025 |
| Date of Acceptance: | 24 September 2025 |
| Last Modified: | 15 Dec 2025 16:45 |
| URI: | https://orca.cardiff.ac.uk/id/eprint/183216 |
Actions (repository staff only)
![]() |
Edit Item |





Dimensions
Dimensions