ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Hierarchical reinforcement learning with universal policies for multi-step robotic manipulation

Yang, Xintong

, Ji, Ze

, Wu, Jing

, Lai, Yu-kun

, Wei, Changyun, Liu, Guoliang and Setchi, Rossitza

2022. Hierarchical reinforcement learning with universal policies for multi-step robotic manipulation. IEEE Transactions on Neural Networks and Learning Systems 33 (9) , pp. 4727-4741. 10.1109/TNNLS.2021.3059912

	PDF - Accepted Post-Print Version Download (4MB)
	PDF - Supplemental Material Download (4MB)

Official URL: http://dx.doi.org/10.1109/TNNLS.2021.3059912

Abstract

Multistep tasks, such as block stacking or parts (dis)assembly, are complex for autonomous robotic manipulation. A robotic system for such tasks would need to hierarchically combine motion control at a lower level and symbolic planning at a higher level. Recently, reinforcement learning (RL)-based methods have been shown to handle robotic motion control with better flexibility and generalizability. However, these methods have limited capability to handle such complex tasks involving planning and control with many intermediate steps over a long time horizon. First, current RL systems cannot achieve varied outcomes by planning over intermediate steps (e.g., stacking blocks in different orders). Second, the exploration efficiency of learning multistep tasks is low, especially when rewards are sparse. To address these limitations, we develop a unified hierarchical reinforcement learning framework, named Universal Option Framework (UOF), to enable the agent to learn varied outcomes in multistep tasks. To improve learning efficiency, we train both symbolic planning and kinematic control policies in parallel, aided by two proposed techniques: 1) an auto-adjusting exploration strategy (AAES) at the low level to stabilize the parallel training, and 2) abstract demonstrations at the high level to accelerate convergence. To evaluate its performance, we performed experiments on various multistep block-stacking tasks with blocks of different shapes and combinations and with different degrees of freedom for robot control. The results demonstrate that our method can accomplish multistep manipulation tasks more efficiently and stably, and with significantly less memory consumption.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Engineering
Publisher:	IEEE
ISSN:	2162-237X
Date of First Compliant Deposit:	18 February 2021
Date of Acceptance:	11 February 2021
Last Modified:	20 Mar 2025 22:30
URI:	https://orca.cardiff.ac.uk/id/eprint/138649

Citation Data

Cited 11 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Altmetric

Dimensions

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)