Wu, Jing ORCID: https://orcid.org/0000-0001-5123-9861, Lai, Yukun ORCID: https://orcid.org/0000-0002-2094-5680, Xu, Mingchen and Ji, Ze ORCID: https://orcid.org/0000-0002-8968-9902 2024. Fusion of short-term and long-term attention for video mirror detection. Presented at: IEEE Conference on Multimedia Expo 2024, Niagara Falls, Canada, 15-19 July 2024. 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp. 1-9. 10.1109/icme57554.2024.10688367 |
Preview |
PDF
- Accepted Post-Print Version
Download (1MB) | Preview |
Abstract
Techniques for detecting mirrors from static images have witnessed rapid growth in recent years. However, these methods detect mirrors from single input images. Detecting mirrors from video requires further consideration of temporal consistency between frames. We observe that humans can recognize mirror candidates, from just one or two frames, based on their appearance (e.g. shape, color). However, to ensure that the candidate is indeed a mirror (not a picture or a window), we often need to observe more frames for a global view. This observation motivates us to detect mirrors by fusing appearance features extracted from a short-term attention module and context information extracted from a long-term attention module. To evaluate the performance, we build a challenging benchmark dataset of 19,255 frames from 281 videos. Experimental results demonstrate that our method achieves state-of-the-art performance on the benchmark dataset.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Published Online |
Status: | Published |
Schools: | Advanced Research Computing @ Cardiff (ARCCA) Computer Science & Informatics |
Publisher: | IEEE |
ISBN: | 979-8-3503-9015-5 |
Date of First Compliant Deposit: | 15 May 2024 |
Date of Acceptance: | 13 March 2024 |
Last Modified: | 18 Oct 2024 09:47 |
URI: | https://orca.cardiff.ac.uk/id/eprint/168925 |
Actions (repository staff only)
Edit Item |