Zeng, Yirui, Fu, Jun, Yue, Guanghui, Liu, Hantao ![]() |
Abstract
Contrastive Language-Image Pretraining (CLIP) models have demonstrated strong performance in blind dehazed image quality assessment (DQA), yet their efficiency remains a concern. In this letter, we introduce CLIP-DQA V2, which explores CLIP for efficient blind DQA from a fragment-level perspective. To effectively map fragments sampled from dehazed images to quality scores, CLIP-DQA V2 integrates two key components: (1) multi-modal prompt learning, which jointly optimizes CLIP’s image and text encoders for better alignment between fragments and quality-related text descriptions, and (2) a semantic consistency loss that alleviates the semantic degradation caused by fragment sampling. Experiments on two widely used benchmark datasets show that CLIP-DQA V2 reduces computational cost by nearly 45% compared to previous methods, while delivering more accurate quality predictions.
Item Type: | Article |
---|---|
Date Type: | Published Online |
Status: | Published |
Schools: | Schools > Computer Science & Informatics |
Additional Information: | License information from Publisher: LICENSE 1: URL: https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html, Start Date: 2025-01-01 |
Publisher: | Institute of Electrical and Electronics Engineers |
ISSN: | 1070-9908 |
Last Modified: | 14 Oct 2025 09:39 |
URI: | https://orca.cardiff.ac.uk/id/eprint/181645 |
Actions (repository staff only)
![]() |
Edit Item |