skip to content

Mathematical Research at the University of Cambridge

 

Sequential data collection and decision making lie at the core of modern statistical learning, from anytime-valid A/B testing in online experiments to adaptive multi-armed bandit problems. This talk presents recent advances in the design and analysis of optimal algorithms for these two settings. In the first part, I will introduce a framework for anytime-valid, variance-adaptive inference in monotonic processes--such as cumulative distribution functions--that builds on the coin-betting paradigm of game-theoretic statistics and integrates PAC-Bayesian principles to yield tight hypothesis tests that are uniform not only in time but also in space. In the second part, I will focus on stochastic gradient bandits, a fundamental policy-gradient approach to online decision making, and present theoretical results showing how the learning rate governs the algorithm’s regret, revealing sharp thresholds that separate logarithmic and polynomial regimes and depend on the (unknown) sub-optimality gap.
(Based on joint work with E. Clerico and H. E. Flynn, and with D. Baudry, E. Johnson, S. Vary, and C. Pike-Burke)

Further information

Time:

10Nov
Nov 10th 2025
10:50 to 11:30

Venue:

Seminar Room 1, Newton Institute

Speaker:

Patrick Rebeschini (University of Oxford)

Series:

Isaac Newton Institute Seminar Series