Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Preference-based deep reinforcement learning with automatic curriculum learning for map-free UGV navigation in factory-like environments

Tian, Shunyu, Wei, Changyun, Jian, Shaojie and Ji, Ze ORCID: https://orcid.org/0000-0002-8968-9902 2025. Preference-based deep reinforcement learning with automatic curriculum learning for map-free UGV navigation in factory-like environments. Engineering Science and Technology, an International Journal 70 , 102147. 10.1016/j.jestch.2025.102147

[thumbnail of 1-s2.0-S2215098625002022-main.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (2MB)

Abstract

Autonomous navigation for unmanned ground vehicles in smart factory environments involves satisfying multiple, often conflicting objectives such as safety, efficiency, and motion smoothness. Traditional reinforcement learning approaches typically rely on fixed, manually weighted reward functions to encode these objectives. However, such static formulations struggle to generalize across varying user preferences and dynamic operational contexts common in real-world factory scenarios. Consequently, they require retraining for every new preference configuration, leading to inefficiency and limited practical deployment. To address this challenge, we propose a novel preference-based reinforcement learning framework that enables a single policy to dynamically adapt its behavior according to a user-defined preference vector that encodes trade-offs among multiple objectives. This allows the agent to modify its navigation strategy on-the-fly without additional retraining. To further improve training efficiency and learning stability, we incorporate automatic curriculum learning, which gradually increases the complexity of training tasks based on the agent’s performance, accelerating convergence and robustness. We validate our method in a simulated smart factory environment that reflects realistic navigation constraints. Experimental results demonstrate that our proposed approach ensures faster convergence during training and achieves up to a 93% navigation success rate in challenging factory-like environments compared to recent advances.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Schools > Engineering
Publisher: Elsevier
ISSN: 2215-0986
Date of First Compliant Deposit: 18 August 2025
Date of Acceptance: 18 July 2025
Last Modified: 18 Aug 2025 13:11
URI: https://orca.cardiff.ac.uk/id/eprint/180497

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics