Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

PhysLab: A benchmark dataset for multi-granularity visual parsing of physics experiments

Zou, Minghao, Zeng, Qingtian, Miao, Yongping, Liu, Shangkun, Wang, Zilong, Liu, Hantao ORCID: https://orcid.org/0000-0003-4544-3481 and Zhou, Wei 2025. PhysLab: A benchmark dataset for multi-granularity visual parsing of physics experiments. Presented at: MM '25:The 33rd ACM International Conference on Multimedia, Dublin, Ireland, 31 October 2025. IXR '25: Proceedings of the 3rd International Workshop on Interactive eXtended Reality. Dublin: ACM, pp. 12799-12806. 10.1145/3746027.3758221

Full text not available from this repository.

Abstract

Visual parsing of images and videos is critical for a wide range of real-world applications. However, progress in this field is constrained by limitations of existing datasets: (1) limited annotation diversity, which limits the support for diverse vision tasks within a unified dataset; (2) insufficient coverage of domains, particularly a lack of datasets tailored for educational scenarios; and (3) a lack of explicit procedural guidance, with weak logical rules and insufficient representation of a structured task process. To address these gaps, we introduce PhysLab, the first dataset that captures students conducting complex physics experiments. The dataset includes four representative experiments that feature diverse scientific instruments and rich human-object interaction (HOI) patterns. PhysLab comprises 620 long-form videos and provides multi-granularity annotations that support a variety of vision tasks, including action recognition, object detection, HOI analysis, etc. We establish baselines and perform extensive evaluations to highlight key challenges in the parsing of procedural educational videos. We expect PhysLab to serve as a valuable resource for advancing comprehensive visual parsing, facilitating intelligent classroom systems, and fostering closer integration among computer vision, multimedia, and educational technologies. The dataset and the evaluation toolkit are publicly available at https://github.com/ZMH-SDUST/PhysLab.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: ACM
ISBN: 979-8-4007-2051-2
Last Modified: 06 Nov 2025 10:45
URI: https://orca.cardiff.ac.uk/id/eprint/182178

Actions (repository staff only)

Edit Item Edit Item