Tackling Label Corruptions: Univariate Polynomial Regression and Generalized Linear Models

Label corruptions pose a significant challenge in various machine learning tasks, affecting the accuracy and reliability of models. In this talk, we will address two distinct problems involving label corruptions, and present approaches to handle them effectively.

The first problem we consider is that of robust univariate polynomial regression. In this problem the goal is to recover a polynomial which is pointwise close to a target polynomial, given samples where, with probability $\alpha$ the samples are clean (satisfy the model); and with probability $1-\alpha$ the label is corrupted (completely arbitrary). We propose an approach which can tolerate a corruption fraction as large as any constant less than 1/2, which is the information theoretic limit for unique recovery in this problem.

In the second problem, we examine the challenge of learning a linear function composed with a generalized linear model (GLM). We focus on the oblivious noise setting, where up to any constant fraction of the labels are corrupted via arbitrary independent and additive noise. We show that in this setting, it is always possible to recover a polynomial-sized list of candidates, one of which is arbitrarily close to the true answer. Furthermore, under mild distributional assumptions, we show this recovery is unique.

*This talk is co-hosted by the Computer Laboratory AI Research Group.*

Further information

Time:

Venue:

Speaker:

Series:

Forthcoming Seminars

News, Announcements and Events

Social media

Study at Cambridge

About the University

Research at Cambridge