Huang, Kun, Zhang, Fang-Lue, Zhang, Fangfang, Lai, Yukun ORCID: https://orcid.org/0000-0002-2094-5680, Rosin, Paul ORCID: https://orcid.org/0000-0002-4965-3884 and Dodgson, Neil A. 2024. Multi-task geometric estimation of depth and surface normal from monocular 360° images. Computational Visual Media |
Preview |
PDF
- Accepted Post-Print Version
Download (9MB) | Preview |
Abstract
Geometric estimation is required for scene understanding and analysis in panoramic 360° images. Current methods usually predict a single feature, such as depth or surface normal. These methods can lack robustness, especially when dealing with intricate textures or complex object surfaces. We introduce a novel multi-task learning (MTL) network that simultaneously estimates depth and surface normals from 360° images. Our first innovation is our MTL architecture, which enhances predictions for both tasks by integrating geometric information from depth and surface normal estimation, enabling a deeper understanding of 3D scene structure. Another innovation is our fusion module, which bridges the two tasks, allowing the network to learn shared representations that improve accuracy and robustness. Experimental results demonstrate that our MTL architecture significantly outperforms state-of-the-art methods in both depth and surface normal estimation, showing superior performance in complex and diverse scenes. Our model’s effectiveness and generalizability, particularly in handling intricate surface textures, establish it as a new benchmark in 360° image geometric estimation. The code and model will be released.
Item Type: | Article |
---|---|
Status: | In Press |
Schools: | Computer Science & Informatics |
Publisher: | SpringerOpen |
ISSN: | 2096-0433 |
Funders: | The Royal Society |
Date of First Compliant Deposit: | 7 December 2024 |
Date of Acceptance: | 25 October 2024 |
Last Modified: | 10 Dec 2024 13:00 |
URI: | https://orca.cardiff.ac.uk/id/eprint/174573 |
Actions (repository staff only)
Edit Item |