Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Learning structured spatiotemporal tasks with xLSTM under uncertainty: A multi-task approach

Llanos-Neuta, Nicolás, Obando-Ceron, Johan, Perafan-Villota, Juan C. and Romero Cano, Victor ORCID: https://orcid.org/0000-0003-2910-5116 2025. Learning structured spatiotemporal tasks with xLSTM under uncertainty: A multi-task approach. Presented at: 12th Workshop on Engineering Applications, Cali, Colombia, 29–31 October 2025. Published in: Figueroa-García, Juan Carlos, López-Sotelo, Jesús Alfonso, Moreno-Trujillo, John Freddy and Gaona-García, Elvis Eduardo eds. Applied Computer Sciences in Engineering. Communications in Computer and Information Science Springer, 234–246. 10.1007/978-3-032-08203-9_20
Item availability restricted.

[thumbnail of Learning_Structured_Temporal_Tasks.pdf] PDF - Accepted Post-Print Version
Restricted to Repository staff only until 27 November 2025 due to copyright restrictions.

Download (883kB)

Abstract

The demand for real-time perception in complex robotic systems operating in dynamic environments has motivated the development of architectures capable of solving multiple learning tasks simultaneously. However, current Multi-Task Learning approaches often suffer from poor task balancing, lack of inter-task regularization, and static loss weighting. This work introduces a multi-task framework based on Extended Long Short-Term Memory (xLSTM) to jointly predict the motion of 3D bounding boxes, semantic classes, 3D velocity, and categorical dynamic behavior of objects. The framework adopts an encoder–multi-head architecture with shared temporal representations and task-specific heads. Two auxiliary tasks, velocity regression and dynamic state classification, are derived from physical approximations and incorporated to guide training. A homoscedastic uncertainty-based loss weighting strategy dynamically adjusts task influence during optimization. Quantitative results on the KITTI benchmark show that the proposed framework achieves lower RMSE for motion estimation and higher F1-scores for classification tasks compared to baselines. Auxiliary tasks improve convergence and coherence, while uncertainty weighting enhances training stability. This architecture offers a scalable and interpretable solution for spatiotemporal modeling and has the potential to benefit downstream applications in robotics and intelligent monitoring systems.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: Springer
ISBN: 9783032082022
Date of First Compliant Deposit: 25 October 2025
Date of Acceptance: 1 August 2025
Last Modified: 27 Oct 2025 12:46
URI: https://orca.cardiff.ac.uk/id/eprint/181891

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics