Global Bandits

Atan, Onur; Tekin, Cem; van der Schaar, Mihaela

Computer Science > Machine Learning

arXiv:1503.08370 (cs)

[Submitted on 29 Mar 2015 (v1), last revised 21 Mar 2018 (this version, v3)]

Title:Global Bandits

Authors:Onur Atan, Cem Tekin, Mihaela van der Schaar

View PDF

Abstract:Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the reward distributions of each arm are independent. But in a wide variety of decision problems -- from drug dosage to dynamic pricing -- the expected rewards of different arms are correlated, so that selecting one arm provides information about the expected rewards of other arms as well. We propose and analyze a class of models of such decision problems, which we call {\em global bandits}. In the case in which rewards of all arms are deterministic functions of a single unknown parameter, we construct a greedy policy that achieves {\em bounded regret}, with a bound that depends on the single true parameter of the problem. Hence, this policy selects suboptimal arms only finitely many times with probability one. For this case we also obtain a bound on regret that is {\em independent of the true parameter}; this bound is sub-linear, with an exponent that depends on the informativeness of the arms. We also propose a variant of the greedy policy that achieves $\tilde{\mathcal{O}}(\sqrt{T})$ worst-case and $\mathcal{O}(1)$ parameter dependent regret. Finally, we perform experiments on dynamic pricing and show that the proposed algorithms achieve significant gains with respect to the well-known benchmarks.

Comments:	arXiv admin note: substantial text overlap with arXiv:1410.7890
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1503.08370 [cs.LG]
	(or arXiv:1503.08370v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1503.08370

Submission history

From: Onur Atan [view email]
[v1] Sun, 29 Mar 2015 00:16:58 UTC (469 KB)
[v2] Fri, 14 Apr 2017 18:42:42 UTC (572 KB)
[v3] Wed, 21 Mar 2018 07:43:48 UTC (534 KB)

Computer Science > Machine Learning

Title:Global Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Global Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators