skip to content

Mathematical Research at the University of Cambridge

 

*Abstract*: We study the sample and time complexity of online stochastic gradient descent (SGD) in learning a two-layer neural network with M orthogonal neurons on isotropic Gaussian data. We focus on the challenging “extensive-width” regime M≫1 and allow for large condition number in the second-layer parameters, covering the power-law scaling a_m= m^{-β} as a special case. We characterize the SGD dynamics for the training of a student two-layer neural network and identify sharp transition times for the recovery of each signal direction. In the power-law setting, our analysis entails that while the learning of individual teacher neurons exhibits abrupt phase transitions, the juxtaposition of emergent learning curves at different timescales results in a smooth scaling law in the cumulative objective.

*This talk is co-hosted by the Computer Laboratory AI Research Group and the Informed-AI Hub.*

Further information

Time:

20Nov
Nov 20th 2025
14:00 to 15:00

Venue:

MR14, Centre for Mathematical Sciences

Series:

Machine learning theory