Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Scale-aware network with modality-awareness for RGB-D indoor semantic segmentation

Zhou, Feng, Lai, Yu-Kun ORCID: https://orcid.org/0000-0002-2094-5680, Rosin, Paul L. ORCID: https://orcid.org/0000-0002-4965-3884, Zhang, Fengquan and Hu, Yong 2022. Scale-aware network with modality-awareness for RGB-D indoor semantic segmentation. Neurocomputing 492 , pp. 464-473. 10.1016/j.neucom.2022.04.025

[thumbnail of SAMD.pdf]
Preview
PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (8MB) | Preview

Abstract

This paper focuses on indoor semantic segmentation based on RGB-D images. Semantic segmentation is a pixel-level classification task that has made steady progress based on fully convolutional networks (FCNs). However, we find there is still room for improvements in the following three aspects. The first is related to multi-scale feature extraction. Recent state-of-the-art works forcibly concatenate multi-scale feature representations extracted by spatial pyramid pooling, dilated convolution or other architectures, regardless of the spatial extent for each pixel. The second is regarding RGB-D modal fusion. Most successful methods treat RGB and depth as two separate modalities and force them to be joined together regardless of their different contributions to the final prediction. The final aspect is about the modeling ability of extracted features. Due to the “local grid” defined by the receptive field, the learned feature representation lacks the ability to model spatial dependencies. In addition to these modules, we design a depth estimation module to encourage the RGB network to extract more effective features. To solve the above challenges, we propose four modules to address them: scale-aware module, modality-aware module, attention module and depth estimation module. Extensive experiments on the NYU-Depth v2 and SUN RGB-D datasets demonstrate that our method is effective against RGB-D indoor semantic segmentation.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Elsevier
ISSN: 0925-2312
Date of First Compliant Deposit: 12 April 2022
Date of Acceptance: 3 April 2022
Last Modified: 21 Nov 2024 22:30
URI: https://orca.cardiff.ac.uk/id/eprint/149167

Citation Data

Cited 14 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics