skip to content

Mathematical Research at the University of Cambridge

 

In many applications, data is observed in multiple modalities. In this talk, we will explore some benefits and properties of multi-modal learned representations. 
First, it has been shown that alignment with a learned representation can help guide diffusion models for better representations and hence generation quality. Here, we derive strategies to enhance representation alignment in diffusion models; for instance, they can profit from guidance even from different modalities.
Second, while alignment assumes paired data, i.e., "views" of the same data point in different modalities, even more data is available in unpaired form. Is it possible to leverage unpaired multimodal data to enhance a model in the target modality? Indeed, it is possible to exploit such data, by building on the assumption that different modalities are projections of a shared underlying reality, yielding gains in practice and theory.
Last, taking a general view on representation geometry, we explore correspondences between independently learned multimodal representations across different models. Typically, they learn similar notions of "similarity". We show that it is possible to align representations across models with a single linear map. In fact, a map that is fit to align one modality across models transfers and automatically aligns the other modality across the same models. This finding enables, for instance, backward-compatible model upgrades without costly re-embedding.
This talk is based on joint work with Sharut Gupta, Chenyu Wang, Cai Zhou, Zongyu Lin, Shobhita Sundaram, Sanyam Kansal, Stephen Bates, Tommi Jaakola, Phillip Isola and Vikas Garg 

Further information

Time:

10Feb
Feb 10th 2026
09:00 to 10:00

Venue:

Seminar Room 1, Newton Institute

Speaker:

Stefanie Jegelka (Technical University of Munich)

Series:

Isaac Newton Institute Seminar Series