Proximal Policy Optimization in the Fisher-Rao geometry

PPO is one of the most widely used algorithms in reinforcement learning, offering a practical policy gradient method with strong empirical performance. However, despite its popularity, PPO lacks rigorous theoretical guarantees for policy improvement and convergence. The method employs a clipped surrogate objective, derived from linearising the value function in a flat geometric setting. In this talk, we introduce a refined surrogate objective based on the Fisher–Rao geometry, leading to a new variant, Fisher–Rao PPO (FR-PPO). Our approach provides robust theoretical guarantees, including monotonic policy improvement and sub-linear convergence rates, representing a substantial advance toward formal convergence results for the wider class of PPO algorithms. This talk is based on joint work with David Siska and Lukasz Szpruch.

Further information

Time:

Venue:

Speaker:

Series:

Forthcoming Seminars

News, Announcements and Events

Study at Cambridge

About the University

Research at Cambridge