Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

A new semi-supervised video anomaly detection baseline in lack of anomalous samples

Zhao, Mengyang, Yu, Haiyang, Fu, Teng, Liu, Yang, Zhou, Wei, Li, Bin and Xue, Xiangyang 2026. A new semi-supervised video anomaly detection baseline in lack of anomalous samples. ACM Transactions on Multimedia Computing, Communications and Applications 10.1145/3797034

Full text not available from this repository.

Abstract

Video anomaly detection (VAD) has been widely studied for its important applications in multimedia community. Recently, many Weakly-Supervised VAD (WS-VAD) methods have been proposed, which tend to treat VAD as a classification task through multiple instance learning and result in the need to collect sufficient anomaly classes and samples to be used for training a classifier. However, anomaly events tend to be open-set and rare in real-world applications, so we often have difficulty collecting all anomaly classes and enough sample anomalies, which is a difficult situation for WS-VAD to cope with. To this end, we consider to treat VAD as an out-of-distribution detection task rather than a classification task and propose a simple but effective semi-supervised baseline method. First, we leverage the powerful zero-shot capability of large visual language models to generate summary text descriptions for videos and extract visual features as intermediates for subsequent use. Next, we use a text encoder to extract language features and combine them with visual features to obtain robust multimodal features. Finally, we introduce an out-of-distribution detection method learns the center of normality in multimodal space from normal and unlabeled samples, while deviating abnormal samples from the center to cope with the scarcity of abnormal samples. To implement our baseline method, we also provide a new semi-supervised dataset by reorganizing an existing benchmark, which is the first available dataset in the VAD community that provides trimmed videos consisting of complete abnormal events. Experiments demonstrate that our method performs more robustly when fewer anomaly classes and anomaly samples collected.

Item Type: Article
Date Type: Published Online
Status: In Press
Schools: Schools > Computer Science & Informatics
Publisher: Association for Computing Machinery (ACM)
ISSN: 1551-6857
Date of Acceptance: 19 January 2026
Last Modified: 09 Mar 2026 12:30
URI: https://orca.cardiff.ac.uk/id/eprint/185606

Actions (repository staff only)

Edit Item Edit Item