Sequential data collection and decision making lie at the core of modern statistical learning, from anytime-valid A/B testing in online experiments to adaptive multi-armed bandit problems. This talk presents recent advances in the design and analysis of optimal algorithms for these two settings. In the first part, I will introduce a framework for anytime-valid, variance-adaptive inference in monotonic processes--such as cumulative distribution functions--that builds on the coin-betting paradigm of game-theoretic statistics and integrates PAC-Bayesian principles to yield tight hypothesis tests that are uniform not only in time but also in space. In the second part, I will focus on stochastic gradient bandits, a fundamental policy-gradient approach to online decision making, and present theoretical results showing how the learning rate governs the algorithm’s regret, revealing sharp thresholds that separate logarithmic and polynomial regimes and depend on the (unknown) sub-optimality gap.
(Based on joint work with E. Clerico and H. E. Flynn, and with D. Baudry, E. Johnson, S. Vary, and C. Pike-Burke)