Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

FVIFormer: flow-guided global-local aggregation transformer network for video inpainting

Yan, Weiqing, Sun, Yiqiu, Yue, Guanghui, Zhou, Wei and Liu, Hantao ORCID: https://orcid.org/0000-0003-4544-3481 2024. FVIFormer: flow-guided global-local aggregation transformer network for video inpainting. IEEE Journal of Emerging and Selected Topics in Circuits and Systems 14 (2) , pp. 235-244. 10.1109/JETCAS.2024.3392972

[thumbnail of 24_JETCAS_FVIFormer.pdf]
Preview
PDF - Accepted Post-Print Version
Download (22MB) | Preview

Abstract

Video inpainting has been extensively used in recent years. Established works usually utilise the similarity between the missing region and its surrounding features to inpaint in the visually damaged content in a multi-stage manner. However, due to the complexity of the video content, it may result in the destruction of structural information of objects within the video. In addition to this, the presence of moving objects in the damaged regions of the video can further increase the difficulty of this work. To address these issues, we propose a flow-guided global-Local aggregation Transformer network for video inpainting. First, we use a pre-trained optical flow complementation network to repair the defective optical flow of video frames. Then, we propose a content inpainting module, which use the complete optical flow as a guide, and propagate the global content across the video frames using efficient temporal and spacial Transformer to inpaint in the corrupted regions of the video. Finally, we propose a structural rectification module to enhance the coherence of content around the missing regions via combining the extracted local and global features. In addition, considering the efficiency of the overall framework, we also optimized the self-attention mechanism to improve the speed of training and testing via depth-wise separable encoding. We validate the effectiveness of our method on the YouTube-VOS and DAVIS video datasets. Extensive experiment results demonstrate the effectiveness of our approach in edge-complementing video content that has undergone stabilisation algorithms.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Institute of Electrical and Electronics Engineers
ISSN: 2156-3357
Date of First Compliant Deposit: 27 April 2024
Date of Acceptance: 20 April 2024
Last Modified: 08 Nov 2024 17:15
URI: https://orca.cardiff.ac.uk/id/eprint/168461

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics