In multiclass classification with bandit feedback, a learner only observes whether its predicted label was correct, rather than the true label itself. This simple change turns a basic supervised learning problem into a much harder exploration problem: how can one learn accurately over a large set of possible labels while receiving only minimal feedback? In this talk, I will discuss a recent line of work studying the statistical complexity of bandit multiclass classification. Perhaps surprisingly, we show that bandit feedback can be essentially free: with the right exploration strategy, one can match the full-information sample complexity rates up to logarithmic factors. The key technical ingredient is a new algorithmic approach to low-variance exploration that scales with the sparsity of the rewards, rather than with their ambient dimension. Beyond resolving a substantial gap in prior work, these ideas also extend naturally to richer prediction problems such as list classification, combinatorial feedback models, and contextual bandits; somewhat unexpectedly, they have also played a role in resolving the tight sample complexity of standard full-information multiclass classification.
Based on joint work with PhD student Liad Erez and collaborators Fan Chen, Alon Cohen, Steve Hanneke, Yishay Mansour, Shay Moran, Sasha Rakhlin, and Qian Zhang.