• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Xu, P. (Xu, P..) [1] | Chen, H. (Chen, H..) [2] | Yang, W. (Yang, W..) [3] | Huang, K. (Huang, K..) [4]

Indexed by:

Scopus

Abstract:

A key challenge in reinforcement learning is how to guide agents to efficiently explore sparse reward environments. In order to overcome this challenge, the state-of-the-art methods introduce additional intrinsic rewards based on state-related information, such as the novelty of states. Unfortunately, these methods frequently fail in procedurally-generated tasks, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Recently, some exploration methods designed specifically for procedurally-generated tasks have been proposed. However, they still only consider state-related information, which leads to relatively inefficient exploration. In this work, we propose a novel exploration method, which utilizes cross-episode policy-related information and intraepisode state-related information to jointly encourage exploration in procedurally-generated tasks. In term of policy-related information, we first use an imitator-based unbalanced policy diversity to measure the difference between the agent’s current policy and the agent’s previous policies, and then encourage the agent to maximize this difference. In term of state-related information, we encourage the agent to maximize the state diversity within an episode, thereby visiting as many different states as possible in an episode. We show that our method significantly improves sample efficiency over state-of-the-art methods on three challenging benchmarks, including MiniGrid, MiniWorld, and the sparse-reward version of Procgen. © 2013 IEEE.

Keyword:

Deep reinforcement learning exploration procedurally-generated task sparse reward

Community:

  • [ 1 ] [Xu P.]Chinese Academy of Sciences, Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Beijing, 100190, China
  • [ 2 ] [Chen H.]University of Chinese Academy of Sciences, School of Emergency Management Science and Engineering, Beijing, 100190, China
  • [ 3 ] [Yang W.]Fuzhou University, College of Computer and Data Science, Fuzhou, 350100, China
  • [ 4 ] [Huang K.]Chinese Academy of Sciences, Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Beijing, 100190, China
  • [ 5 ] [Huang K.]University of Chinese Academy of Sciences, School of Artificial Intelligence, Beijing, 100049, China

Reprint 's Address:

Email:

Show more details

Related Keywords:

Source :

IEEE Transactions on Systems, Man, and Cybernetics: Systems

ISSN: 2168-2216

Year: 2025

8 . 6 0 0

JCR@2023

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

Affiliated Colleges:

Online/Total:264/10869692
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1