ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Adaptive spatiotemporal graph transformer network for action quality assessment

Liu, Jiang, Wang, Huasheng, Zhou, Wei, Stawarz, Katarzyna

, Corcoran, Padraig

, Chen, Ying and Liu, Hantao

2025. Adaptive spatiotemporal graph transformer network for action quality assessment. IEEE Transactions on Circuits and Systems for Video Technology 10.1109/TCSVT.2025.3541456

[thumbnail of Adaptive_Spatiotemporal_Graph_Transformer_Network_for_Action_Quality_Assessment.pdf]

Preview

PDF - Accepted Post-Print Version
Download (5MB) | Preview

Official URL: http://dx.doi.org/10.1109/TCSVT.2025.3541456

Abstract

Long video action quality assessment (AQA) aims to evaluate the performance of long-term actions depicted in a video and produce an overall assessment for action quality. A video of long-term actions often contains more complicated temporal and spatial information than that of short-term actions. However, existing approaches that segment a video into individual clips for independent analysis potentially disrupt the narrative flow and diminish contextual details within and across clips, impeding comprehensive video understanding. To address this challenge, we propose an adaptive spatiotemporal graph transformer network (ASGTN) that combines multiple graph structures and transformer attention mechanisms to capture both local and global contextual information within and across clips in a long video. Specifically, the adaptive spatiotemporal graph (ASG) combines a spatial graph branch, designed to enrich the local nuanced spatiotemporal relations within an individual clip, and a temporal graph branch, tailored to dynamically learn the semantic context across different clips. Furthermore, a transformer encoder is integrated to amplify the global dependencies across clips in the entire video. This structure is designed to preserve narrative coherence and maintain essential contextual details in video-level features. Finally, we employ a level-focused decoder to predict the action quality score distribution. Experiments demonstrate that our model achieves state-of-the-art results on popular AQA datasets. Our code is available at https://github.com/jiangliu5/ASGTN AQA.

Item Type:	Article
Date Type:	Published Online
Status:	In Press
Schools:	Schools > Computer Science & Informatics
Publisher:	Institute of Electrical and Electronics Engineers
ISSN:	1051-8215
Date of First Compliant Deposit:	13 February 2025
Date of Acceptance:	7 February 2025
Last Modified:	20 Feb 2025 15:00
URI:	https://orca.cardiff.ac.uk/id/eprint/176178

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)