Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Single image 3D shape retrieval viaCross-Modal instance and category contrastive learning

Lin, Ming-Xian, Yang, Jie, Wang, He, Lai, Yukun ORCID:, Jia, Rongfei, Zhao, Binqiang and Gao, Lin 2022. Single image 3D shape retrieval viaCross-Modal instance and category contrastive learning. Presented at: CVF/IEEE International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 11-17 October 2021. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp. 11385-11395. 10.1109/ICCV48922.2021.01121

[thumbnail of ImageShapeRetrieval_ICCV2021.pdf] PDF - Accepted Post-Print Version
Download (15MB)


In this work, we tackle the problem of single image-based 3D shape retrieval (IBSR), where we seek to find the most matched shape of a given single 2D image from a shape repository. Most of the existing works learn to embed 2D images and 3D shapes into a common feature space and perform metric learning using a triplet loss. Inspired by the great success in recent contrastive learning works on self-supervised representation learning, we propose a novel IBSR pipeline leveraging contrastive learning. We note that adopting such cross-modal contrastive learning between 2D images and 3D shapes into IBSR tasks is non-trivial and challenging: contrastive learning requires very strong data augmentation in constructed positive pairs to learn the feature invariance, whereas traditional metric learning works do not have this requirement. Moreover, object shape and appearance are entangled in 2D query images, thus making the learning task more difficult than contrasting single-modal data. To mitigate the challenges, we propose to use multi-view grayscale rendered images from the 3D shapes as a shape representation. We then introduce a strong data augmentation technique based on color transfer, which can significantly but naturally change the appearance of the query image, effectively satisfying the need for contrastive learning. Finally, we propose to incorporate a novel category-level contrastive loss that helps distinguish similar objects from different categories, in addition to classic instance-level contrastive loss. Our experiments demonstrate that our approach achieves the best performance on all the three popular IBSR benchmarks, including Pix3D, Stanford Cars, and Comp Cars, outperforming the previous state-of-the-art from 4% - 15% on retrieval accuracy.

Item Type: Conference or Workshop Item (Paper)
Date Type: Published Online
Status: Published
Schools: Computer Science & Informatics
Publisher: IEEE
ISBN: 9781665428132
Funders: Royal Society
Date of First Compliant Deposit: 2 September 2021
Date of Acceptance: 22 July 2021
Last Modified: 09 Nov 2022 11:35

Citation Data

Cited 4 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics