Parać, Roko, Nodari, Lorenzo, Ardon, Leo, Furelos-Blanco, Daniel, Cerutti, Federico ![]() |
Abstract
This paper presents PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state ma- chine that decomposes the agent’s task into different sub- tasks. PROB-IRM uses a state-of-the-art inductive logic pro- gramming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of be- liefs, thus ensuring robustness against inconsistencies. Piv- otal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental analysis shows that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks success- fully. Despite the complexity of learning the RM from noisy traces, agents trained with PROB-IRM perform comparably to agents provided with handcrafted RMs.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
ISBN: | 9781956792058 |
ISSN: | 2334-1033 |
Last Modified: | 26 Nov 2024 16:32 |
URI: | https://orca.cardiff.ac.uk/id/eprint/173713 |
Actions (repository staff only)
![]() |
Edit Item |