Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Learning virtual view selection for 3D scene semantic segmentation

Mu, Tai-Jiang, Shen, Ming-Yuan, Lai, Yu-Kun ORCID: https://orcid.org/0000-0002-2094-5680 and Hu, Shi-Min 2024. Learning virtual view selection for 3D scene semantic segmentation. IEEE Transactions on Image Processing 33 , 4159 - 4172. 10.1109/TIP.2024.3421952

[thumbnail of Virtual_View_Selection_for_2D_3D_Joint_Learning_TIP.pdf]
Preview
PDF - Accepted Post-Print Version
Download (6MB) | Preview

Abstract

2D-3D joint learning is essential and effective for fundamental 3D vision tasks, such as 3D semantic segmentation, due to the complementary information these two visual modalities contain. Most current 3D scene semantic segmentation methods process 2D images “as they are”, i.e., only real captured 2D images are used. However, such captured 2D images may be redundant, with abundant occlusion and/or limited field of view (FoV), leading to poor performance for the current methods involving 2D inputs. In this paper, we propose a general learning framework for joint 2D-3D scene understanding by selecting informative virtual 2D views of the underlying 3D scene. We then feed both the 3D geometry and the generated virtual 2D views into any joint 2D-3D-input or pure 3D-input based deep neural models for improving 3D scene understanding. Specifically, we generate virtual 2D views based on an information score map learned from the current 3D scene semantic segmentation results. To achieve this, we formalize the learning of the information score map as a deep reinforcement learning process, which rewards good predictions using a deep neural network. To obtain a compact set of virtual 2D views that jointly cover informative surfaces of the 3D scene as much as possible, we further propose an efficient greedy virtual view coverage strategy in the normal-sensitive 6D space, including 3-dimensional point coordinates and 3-dimensional normal. We have validated our proposed framework for various joint 2D-3D-input or pure 3D-input based deep neural models on two real-world 3D scene datasets, i.e., ScanNet v2 and S3DIS, and the results demonstrate that our method obtains a consistent gain over baseline models and achieves new top accuracy for joint 2D and 3D scene semantic segmentation. Code is available at https://github.com/smy-THU/VirtualViewSelection .

Item Type: Article
Date Type: Published Online
Status: Published
Schools: Computer Science & Informatics
Publisher: Institute of Electrical and Electronics Engineers
ISSN: 1057-7149
Date of First Compliant Deposit: 18 July 2024
Date of Acceptance: 5 May 2024
Last Modified: 20 Jul 2024 03:24
URI: https://orca.cardiff.ac.uk/id/eprint/170664

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics