Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

Abstract

In this paper, we consider the problem of sequentially optimizing a black-box function $f$ based on noisy samples and bandit feedback. We assume that $f$ is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret, measuring the sum of regrets over the $T$ chosen points.
For the isotropic squared-exponential kernel in $d$ dimensions, we find that an average simple regret of $\epsilon$ requires $T=\Omega\left(\tfrac{1}{\epsilon^2}(\log\tfrac{1}{\epsilon})^{0.5d} \right)$, and the average cumulative regret is at least $\Omega \left(\sqrt{T (\log T)^{d/2}} \right)$, thus matching existing upper bounds up to the replacement of $d/2$ by $2d + O(1)$ in both cases. For the Matern-$\nu$ kernel, we give analogous bounds of the form $\Omega\left((\frac{1}{\epsilon})^{2 + d/\nu}\right)$ and $\Omega \left(T^{\tfrac{\nu + d}{2\nu + d}} \right)$, and discuss the resulting gaps to the existing upper bounds.

Publication
Conference on Learning Theory (COLT), Amsterdam, 2017