Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)

Bibtex Metadata Paper

Authors

Elad Hazan, Satyen Kale

Abstract

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in \cite{AbernethyR09}, which is parameterized by a scalar (\alpha). We prove that the regret of \newtron is (O(\log T)) when (\alpha) is a constant that does not vary with horizon (T), and at most (O(T^{2/3})) if (\alpha) is allowed to increase to infinity with (T). For (\alpha) = (O(\log T)), the regret is bounded by (O(\sqrt{T})), thus solving the open problem of \cite{KST08, AbernethyR09}. Our algorithm is based on a novel application of the online Newton method \cite{HAK07}. We test our algorithm and show it to perform well in experiments, even when (\alpha) is a small constant.


Name Change Policy

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings.

Use the "Report an Issue" link to request a name change.

Do not remove: This comment is monitored to verify that the site is working properly

AltStyle によって変換されたページ (->オリジナル) /