Indexed by:
Abstract:
We investigate the online bandit learning of the monotone multi-linear DR-submodular functions, designing the algorithm BanditMLSM that attains O(T2/3 log T) of (1 − 1/e)-regret. Then we reduce submodular bandit with partition matroid constraint and bandit sequential monotone maximization to the online bandit learning of the monotone multi-linear DR-submodular functions, attaining O(T2/3 log T) of (1−1/e)-regret in both problems, which improve the existing results. To the best of our knowledge, we are the first to give a sublinear regret algorithm for the submodular bandit with partition matroid constraint. A special case of this problem is studied by Streeter et al. (2009). They prove a O(T4/5) (1 − 1/e)-regret upper bound. For the bandit sequential submodular maximization, the existing work proves an O(T2/3) regret with a suboptimal 1/2 approximation ratio (Niazadeh et al., 2021). © 2023 Proceedings of Machine Learning Research. All rights reserved.
Keyword:
Reprint 's Address:
Email:
Source :
Year: 2023
Volume: 202
Page: 35491-35524
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: