skip to content

Mathematical Research at the University of Cambridge

 

PPO is one of the most widely used algorithms in reinforcement learning, offering a practical policy gradient method with strong empirical performance. However, despite its popularity, PPO lacks rigorous theoretical guarantees for policy improvement and convergence. The method employs a clipped surrogate objective, derived from linearising the value function in a flat geometric setting. In this talk, we introduce a refined surrogate objective based on the Fisher–Rao geometry, leading to a new variant, Fisher–Rao PPO (FR-PPO). Our approach provides robust theoretical guarantees, including monotonic policy improvement and sub-linear convergence rates, representing a substantial advance toward formal convergence results for the wider class of PPO algorithms. This talk is based on joint work with David Siska and Lukasz Szpruch.

Further information

Time:

11Nov
Nov 11th 2025
16:30 to 17:10

Venue:

Seminar Room 1, Newton Institute

Speaker:

Razvan-Andrei Lascu (RIKEN)

Series:

Isaac Newton Institute Seminar Series