Cai, Boliang, Wei, Changyun and Ji, Ze ORCID: https://orcid.org/0000-0002-8968-9902 2024. Deep reinforcement learning with multiple unrelated rewards for AGV mapless navigation. IEEE Transactions on Automation Science and Engineering 10.1109/TASE.2024.3410162 |
PDF
- Accepted Post-Print Version
Download (27MB) |
Abstract
Mapless navigation for Automated Guided Vehicles (AGV) via Deep Reinforcement Learning (DRL) algorithms has attracted significantly rising attention in recent years. Collision avoidance from dynamic obstacles in unstructured environments, such as pedestrians and other vehicles, is one of the key challenges for mapless navigation. Autonomous navigation requires a policy to make decisions to optimize the path distance towards the goal but also to reduce the probability of collisions with obstacles. Mostly, the reward for AGV navigation is calculated by combining multiple reward functions for different purposes, such as encouraging the robot to move towards the goal or avoiding collisions, as a state-conditioned function. The combined reward, however, may lead to biased behaviours due to the empirically chosen weights when multiple rewards are combined and dangerous situations are misjudged. Therefore, this paper proposes a learning-based method with multiple unrelated rewards, which represent the evaluation of different behaviours respectively. The policy network, named Multi-Feature Policy Gradients (MFPG), is conducted by two separate Q networks that are constructed by two individual rewards, corresponding to goal distance shortening and collision avoidance, respectively. In addition, we also propose an auto-tuning method, named Ada-MFPG, that allows the MFPG algorithm to automatically adjust the weights for the two separate policy gradients. For collision avoidance, we present a new social norm-oriented continuous biased reward for performing specific social norm so as to reduce the probabilities of AGV collisions. By adding an offset gain to one of the reward functions, vehicles conducted by the proposed algorithm exhibited the predetermined features. The work was tested in different simulation environments under multiple scenarios with a single robot or multiple robots. The proposed MFPG method is compared with standard Deep Deterministic Policy Gradient (DDPG), the modified DDPG, SAC and TD3 with a social norm mechanism. MFPG significantly increases the success rate in robot navigation tasks compared with the DDPG. Besides, among all the benchmarking algorithms, the MFPG-based algorithms have the optimal task completion duration and lower variance compared with the baselines. The work has also been tested on real robots. Experiments on the real robots demonstrate the viability of the trained model for the real world scenarios. The learned model can be used for multi-robot mapless navigation in complex environments, such as a warehouse, that need multi-robot cooperation. Our source code and supplementary material is available at https://github.com/dornenkrone/MFPG Note to Practitioners —Autonomous navigation for AGVs in complex and large-scale environments, such as factories and warehouses, is challenging. AGVs are usually centrally controlled and depend on reliable communications. However, centralized control is not always reliable due to poor signal strengths or crashes of the server, and hence unsuitable due to the requirements of accurate information of the dynamic environments and fast responses of decision making. Therefore, it is necessary for the vehicles to perform reliable decision making based on only onboard sensors and processors, for efficient and safe autonomous navigation. Existing methods, such as simultaneous localization and mapping (SLAM) and motion planning algorithms, have been widely used. However, they are neither flexible nor generalizable enough. This paper proposes a method for autonomous navigation based on reinforcement learning (RL), which allows vehicles to gain experience through cumulative rewards by continuously interacting with the environment. The RL-based controller is designed for optimising its performance in two independent aspects, namely collision avoidance and navigation, which are quantified as separate rewards. Instead of carefully hand-crafting a combined reward, our proposed approach trains the agent using the two rewards separately to obtain one optimal policy. It is clearly easier and more practical to design the individual rewards than manually combining them. Besides, the algorithm includes a mechanism for incorporating social norms to encourage the vehicles to follow the right-hand rule, such that they can avoid pedestrians or other vehicles in a socially acceptable manner. This is achieved by adding a continuous bias on the collision avoidance reward. Experiments using simulation environments and real robots suggest that the method is generalizable to multi-robot systems, while guaranteeing safety. In future research, we will focus on incorporating uncertainties of sensor readings for safe and reliable autonomous navigation.
Item Type: | Article |
---|---|
Date Type: | Published Online |
Status: | In Press |
Schools: | Engineering |
Publisher: | Institute of Electrical and Electronics Engineers |
ISSN: | 1545-5955 |
Date of First Compliant Deposit: | 24 May 2024 |
Date of Acceptance: | 24 May 2024 |
Last Modified: | 09 Nov 2024 07:15 |
URI: | https://orca.cardiff.ac.uk/id/eprint/169172 |
Actions (repository staff only)
Edit Item |