Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty

Gao, Yan ORCID: https://orcid.org/0000-0001-5890-9717, Lin, Feiqiang, Cai, Boliang, Wu, Jing ORCID: https://orcid.org/0000-0001-5123-9861, Wei, Changyun, Grech, Raphael and Ji, Ze ORCID: https://orcid.org/0000-0002-8968-9902 2024. Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty. Robotics and Autonomous Systems 182 , 104815. 10.1016/j.robot.2024.104815

[thumbnail of 1-s2.0-S0921889024001994-main.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (3MB)

Abstract

Hierarchical Reinforcement Learning (HRL) has shown superior performance for mapless navigation tasks. However, it remains limited in unstructured environments that might contain terrains like long corridors and dead corners, which can lead to local minima. This is because most HRL-based mapless navigation methods employ a simplified reward setting and exploration strategy. In this work, we propose a novel reward function for training the high-level (HL) policy, which contains two components: extrinsic reward and intrinsic reward. The extrinsic reward encourages the robot to move towards the target location, while the intrinsic reward is computed based on novelty, episode memory and memory decaying, making the agent capable of accomplishing spontaneous exploration. We also design a novel neural network structure that incorporates an LSTM network to augment the agent with memory and reasoning capabilities. We test our method in unknown environments and specific scenarios prone to the local minimum problem to evaluate the navigation performance and local minimum resolution ability. The results show that our method significantly increases the success rate when compared to advanced RL-based methods, achieving a maximum improvement of nearly 28%. Our method demonstrates effective improvement in addressing the local minimum issue, especially in cases where the baselines fail completely. Additionally, numerous ablation studies consistently confirm the effectiveness of our proposed reward function and neural network structure.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Engineering
Computer Science & Informatics
Publisher: Elsevier
ISSN: 0921-8890
Date of First Compliant Deposit: 16 September 2024
Date of Acceptance: 12 September 2024
Last Modified: 30 Sep 2024 09:30
URI: https://orca.cardiff.ac.uk/id/eprint/172147

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics