Exploration via Embracing Diversity in Reinforcement Learning for Sparse-Reward Procedurally-Generated Tasks - Details

author：

Xu, P. (Xu, P..) ^[1] | Chen, H. (Chen, H..) ^[2] | Yang, W. (Yang, W..) ^[3] | Huang, K. (Huang, K..) ^[4]

Indexed by：

Scopus

Abstract：

A　key　challenge　in　reinforcement　learning　is　how　to　guide　agents　to　efficiently　explore　sparse　reward　environments.　In　order　to　overcome　this　challenge,　the　state-of-the-art　methods　introduce　additional　intrinsic　rewards　based　on　state-related　information,　such　as　the　novelty　of　states.　Unfortunately,　these　methods　frequently　fail　in　procedurally-generated　tasks,　where　a　different　environment　is　generated　in　each　episode　so　that　the　agent　is　not　likely　to　visit　the　same　state　more　than　once.　Recently,　some　exploration　methods　designed　specifically　for　procedurally-generated　tasks　have　been　proposed.　However,　they　still　only　consider　state-related　information,　which　leads　to　relatively　inefficient　exploration.　In　this　work,　we　propose　a　novel　exploration　method,　which　utilizes　cross-episode　policy-related　information　and　intraepisode　state-related　information　to　jointly　encourage　exploration　in　procedurally-generated　tasks.　In　term　of　policy-related　information,　we　first　use　an　imitator-based　unbalanced　policy　diversity　to　measure　the　difference　between　the　agent’s　current　policy　and　the　agent’s　previous　policies,　and　then　encourage　the　agent　to　maximize　this　difference.　In　term　of　state-related　information,　we　encourage　the　agent　to　maximize　the　state　diversity　within　an　episode,　thereby　visiting　as　many　different　states　as　possible　in　an　episode.　We　show　that　our　method　significantly　improves　sample　efficiency　over　state-of-the-art　methods　on　three　challenging　benchmarks,　including　MiniGrid,　MiniWorld,　and　the　sparse-reward　version　of　Procgen.　©　2013　IEEE.

Keyword：

Deep reinforcement learning exploration procedurally-generated task sparse reward

Community：

[ 1 ] [Xu P.]Chinese Academy of Sciences, Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Beijing, 100190, China
[ 2 ] [Chen H.]University of Chinese Academy of Sciences, School of Emergency Management Science and Engineering, Beijing, 100190, China
[ 3 ] [Yang W.]Fuzhou University, College of Computer and Data Science, Fuzhou, 350100, China
[ 4 ] [Huang K.]Chinese Academy of Sciences, Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Beijing, 100190, China
[ 5 ] [Huang K.]University of Chinese Academy of Sciences, School of Artificial Intelligence, Beijing, 100049, China

Reprint 's Address：

Email：

Show more details

Related Keywords：

Exploration via Embracing Diversity in Reinforcement Learning for Sparse-Reward Procedurally-Generated Tasks
2025，IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
Enhanced deep reinforcement learning-based thermal management strategy for PEMFC considering coolant system parasitic power
2025，International Journal of Hydrogen Energy
Deep reinforcement learning based active surge control for aeroengine compressors
2024，CHINESE JOURNAL OF AERONAUTICS
Robust decision-making for autonomous vehicles via deep reinforcement learning and expert guidance
2025，APPLIED INTELLIGENCE
DRLO: Optimizing edge server placement in dynamic MEC scenarios using deep reinforcement learning
2025，COMPUTER NETWORKS

Source ：

IEEE Transactions on Systems, Man, and Cybernetics: Systems

ISSN： 2168-2216

Year： 2025

8 . 6 0 0

JCR@2023

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to