Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

ToF-IP: time-of-flight enhanced sparse inertial poser for real-time human motion capture

Yao, Yuan, Jiang, Shifan, Hou, Yangqing, Zuo, Chengxu, Chen, Xinrui, Guo, Shihui and Qin, Yipeng ORCID: https://orcid.org/0000-0002-1551-9126 2025. ToF-IP: time-of-flight enhanced sparse inertial poser for real-time human motion capture. Presented at: The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025), San Diego, California, USA, 2-7 December 2025.

[thumbnail of _NeurIPS_2025__ToF_IP__Time_of_Flight_Enhanced_Sparse_Inertial_Poser_for_Real_time_Human_Motion_Capture.pdf]
Preview
PDF - Accepted Post-Print Version
Download (19MB) | Preview

Abstract

Sparse inertial measurement units (IMUs) provide a portable, low-cost solution for human motion tracking but struggle with error accumulation from drift and sensor noise when estimating joint position through time-based linear acceleration integration (i.e., indirect measurement). To address this, we propose ToF-IP, a novel 3D full-body pose estimation system that integrates Time-of-Flight (ToF) sensors with sparse IMUs. The distinct advantage of our approach is that ToF sensors provide direct distance measurements, effectively mitigating error accumulation without relying on indirect time-based integration. From a hardware perspective, we maintain the portability of existing solutions by attaching ToF sensors to selected IMUs with a negligible volume increase of just 3%. On the software side, we introduce two novel techniques to enhance multi-sensor integration: (i) a Node-Centric Data Integration strategy that leverages a Transformer encoder to explicitly model both intra-node and inter-node data integration by treating each sensing node as a token; and (ii) a Dynamic Spatial Positional Encoding scheme that encodes the continuously changing spatial positions of wearable nodes as motion-conditioned functions, enabling the model to better capture human body dynamics in the embedding space. Additionally, we contribute a 208-minute human motion dataset from 10 participants, including synchronized IMU-ToF measurements and ground-truth from optical tracking. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches such as PNP, achieving superior accuracy in tracking complex and slow motions like Tai Chi, which remains challenging for inertial-only methods.

Item Type: Conference or Workshop Item (Paper)
Status: Unpublished
Schools: Schools > Computer Science & Informatics
Date of First Compliant Deposit: 28 October 2025
Date of Acceptance: 18 September 2025
Last Modified: 28 Oct 2025 14:30
URI: https://orca.cardiff.ac.uk/id/eprint/181822

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics