skip to content
 

Projects

You can indicate up to two of the projects advertised on below that you would like to be considered for.  You can give an order of preference.  

  1. Statistical foundations of AI with Professor Richard Samworth
  2. New algorithms for private/robust ranking with Professor Po-Ling Loh
  3. Parametric generalisation for Koopman operators in dynamical systems with Dr Georg Maierhofer
  4. Magnetic field growth via Kelvin-Helmholtz Instability in Binary Neutron Star Mergers with Dr Loren E Held
  5. Information-theoretic inequalities for theoretical statistics and computer science with Professor Varun Jog
  6. Unifying patterns in algebraic geometry with Dr Fatemeh Rezaee
  7. Learning the score with Professor Richard Samworth

 

Alternatively, you can seek out your own supervisor and apply with a "self-proposed" project.  Click here to jump to more information on this option.

 


Project 1: "Statistical foundations of AI" with Professor Richard Samworth

Project Title Statistical foundations of AI
Keywords Approximation theory, in-context learning, posterior contraction rates, Transformers
Subject Area Statistics, Machine Learning
Contact Name Richard Samworth
Contact Email rjs57@cam.ac.uk
Department DPMMS
Group Statistical Laboratory
Project Duration 8 weeks
Background Information AI is the defining technology of our generation. There are numerous respects, however, in which statistical theory is needed to explain empirical successes and to improve performance. Transformers have emerged as one of the dominant architectures in modern machine learning, as they have achieved state-of-the-art performance in many domains such as natural language processing, computer vision and protein prediction. A striking ability of pretrained Transformers is the phenomenon of 'in-context learning': given a prompt containing examples and a query, Transformers can learn the underlying pattern from the examples and produce accurate output for the query, without updating its parameters.
Project Description Two highly desirable qualities for a classifier in an in-context learning setting are adaptivity and distributional robustness. Here, adaptivity refers to the ability of transformers to achieve faster rates of convergence on easier tasks (as quantified, for instance, by the smoothness of the regression function), while distributional robustness refers to the ability to withstand a shift in distribution between the pretraining and test data. We will study adaptivity and distributional robustness of Transformers for classification tasks in an in-context learning framework. In particular, we will need to develop the approximation theory of Transformers, as well as posterior contraction theory for classification problems.
Work Environment William Underwood is my post-doc and Tianyi Ma is my PhD student. The three of us will jointly supervise the project. I would expect to meet the students weekly, though they may meet with William or Tianyi in between. In general, I think it is good practice for the students to work together in the CMS most of the time during normal working hours, but some remote work is fine too. There is also a brief Monday morning in-person meeting with my research group and a Tuesday online meeting with my extended research group, as well as the Statistics Clinics once every three weeks (the summer students may wish to sit in on consultations to obtain first-hand experience of practical statistical problems).
References Ma, Wang and Samworth (2025) Provable test-time adaptivity and distributional robustness of in-context learning. https://arxiv.org/abs/2510.23254.
Wakayama, T. and Suzuki, T. (2025) In-context learning is provably Bayesian inference: a generalization theory for meta-learning. https://arxiv.org/abs/2510.10981.
Prerequisite Skills Statistics, Mathematical Analysis
Other Skills Used in the Project
Acceptable Programming Languages Python, R

 

Project 2: "New algorithms for private/robust ranking" with Professor Po-Ling Loh

Project Title New algorithms for private/robust ranking
Keywords Algorithmic stability, differential privacy, robustness, statistical machine learning
Subject Area Statistics, Machine Learning
Contact Name Po-Ling Loh
Contact Email pll28@cam.ac.uk
Department DPMMS
Group Statslab
Project Duration 8 weeks
Background Information The notion of "algorithmic stability" has become increasingly popular in many areas of statistical learning theory. Intuitively, this measures how much the output of an algorithm / statistical procedure changes when the input data are perturbed. Some notions of stability that have been defined and studied in recent years include robustness, differential privacy, and replicability, as well as the consequences of stability on the generalization performance of an algorithm. While there are often parallels between effective algorithms that perform well with respect to different criteria, concrete connections between these fields remain largely elusive.
Project Description The goal of this project is to study the statistical problem of ranking under the constraints of robustness and/or privacy. A series of recent papers made some interesting advances in formalizing notions of stability in ranking, and we will use them as a springboard to devise new algorithms which are robust and private. Along the way, we may need to explore and develop new concepts in robustness/privacy that are useful for understanding discrete-structured problems, where the output of the algorithm is a subset of items rather than a continuous-valued vector.
Work Environment The student will be included in weekly research group meetings, and can also meet regularly with a PhD student or postdoc in the group for additional guidance. They can work remotely for part of the time.
References Bousquet & Elisseeff, "Stability and generalization," JMLR 2002;
Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sivakumar & Sorrell, "Stability is stable: Connections between replicability, privacy, and adaptive generalization," STOC 2023;
Cai, Chakraborty & Wang, "Optimal differentially private ranking from pairwise comparisons," 2025;
Liang, Soloff, Barber & Willett, "Assumption-free stability for ranking problems," 2025;
Qiao, Su & Zhang, "Oneshot differentially private top-k selection," ICML 2021
Prerequisite Skills Statistics, Probability/Markov Chains
Other Skills Used in the Project -
Acceptable Programming Languages None required

 

Project 3: "Parametric generalisation for Koopman operators in dynamical systems" with Dr Georg Maierhofer

Project Title Parametric generalisation for Koopman operators in dynamical systems
Keywords Koopman operator, dynamical systems, scientific machine learning, time series forecasting
Subject Area Applied and Computational Analysis
Contact Name Georg Maierhofer
Contact Email gam37@cam.ac.uk
Department DAMTP
Group Applied and Computational Analysis
Project Duration 8 weeks
Background Information

Scientific machine learning has emerged as a powerful set of tools for modelling, simulating, and forecasting complex dynamical systems arising in science and engineering. A particularly promising framework is the Koopman operator, which provides a way to represent nonlinear dynamical systems using linear operators acting on an appropriately chosen space of observables [1]. This yields an effective linear representation of nonlinear dynamics, enabling the application of simple and well-understood linear simulation methodologies to otherwise complex systems.

In recent years, data-driven methods for learning Koopman operators from observations - rather than from explicit governing equations - have gained significant traction. These approaches have been successfully applied to time series forecasting in physical and engineering settings. Software packages such as PyKoopman [3] have further increased the accessibility of these methods to both researchers and practitioners.

A central challenge in many real-world applications is parametric generalisation. In practice, we often have abundant data for certain parameter regimes but only sparse or incomplete data for others. Examples include:

  • Weather and climate modelling, where dense sensor data may be available over land but much sparser measurements over oceans.
  • Engineering systems where experiments can be run only for a limited set of operating conditions (e.g. Reynolds numbers in fluid flows).

Recent work has begun to extend the Koopman operator framework to handle such parametric dependence, allowing models learned at a finite set of parameter values to generalise to others [2]. However, existing approaches largely assume discrete-time observations sampled at uniform time intervals. This assumption is often unrealistic: real data may be sampled irregularly in time, missing observations, or collected asynchronously. Addressing this limitation requires moving from discrete-time to continuous-time formulations of Koopman-based learning and prediction.

Project Description

The aim of this project is to investigate parametric generalisation for Koopman operators, with a particular focus on extending existing discrete-time approaches to the continuous-time setting.

The project will have both theoretical and computational components and will proceed broadly along the following lines:

  1. Literature review and introduction: review of recent work on parametric Koopman learning and generalisation, with particular emphasis on discrete-time methods; initial experimentation with existing software tools, especially the PyKoopman package.
  2. Evaluation of discrete-time approaches: implementation of selected parametric Koopman methods in the discrete-time setting; evaluation of their performance on simple benchmark dynamical systems; understanding of assumptions and limitations related to time discretisation and data regularity.
  3. Extension to continuous-time models: exploration of adapting parametric Koopman frameworks to continuous-time dynamics; investigation of approaches for handling parametric generalisation on unevenly spaced observations; preliminary mathematical analysis of how the Koopman operator behaves under parametric variation in continuous time (time permitting).
  4. Computational implementation and experiments: development of a prototype implementation extending or interfacing with PyKoopman; evaluation of the proposed approach on the Common Task Framework for SciML [5] and, time-permitting, on real-world observational data from WeatherBench [4] or similar datasets.

There is flexibility for the student to focus their exploration on a desired aspect of this problem or for a group of students to work on complementary components.

Upon successful completion of this project the student is expected to have gained:

  • A solid understanding of Koopman operator theory and its role in scientific machine learning.
  • Hands-on experience implementing data-driven Koopman methods in Python.
  • Insight into the challenges of parametric generalisation and irregularly sampled data.
  • Experience producing a short written report summarising the methods, results, and possible future research directions.
Work Environment The student will work on their own (or in a small group if multiple students) and with both supervisors. We expect the student to be present in Cambridge for the majority of the project and to have availability for regular in-person meetings with both supervisors. Some remote work is acceptable, so long as the student is still available for a weekly virtual meeting. We are open to discussing and adjusting projects dates to accommodate student schedules within funding constraints. The Applied and Computational Analysis group usually hosts several summer students and regular seminars even during the summer period, which the student(s) are invited to join.
References [1] Colbrook, Matthew J., Zlatko Drmač, and Andrew Horning. "An Introductory Guide to Koopman Learning." arXiv preprint arXiv:2510.22002 (2025).
[2] Guo, Yue, et al. "Learning parametric Koopman decompositions for prediction and control." SIAM Journal on Applied Dynamical Systems 24.1 (2025): 744-781.
[3] Pan, Shaowu, et al. "PyKoopman: A Python Package for Data-Driven Approximation of the Koopman Operator." Journal of Open Source Software, vol. 9, no. 94, 2024, p. 5881.
[4] Rasp, Stephan, et al. "WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models." Journal of Advances in Modeling Earth Systems, vol. 16, no. 6, 2024, e2023MS004019.
[5] Wyder, Philippe Martin, et al. "Common Task Framework for a Critical Evaluation of Scientific Machine Learning Algorithms." Proceedings of the 39th International Conference on Neural Information Processing Systems (NeurIPS 2025), 2025.
Prerequisite Skills Numerical Analysis, Some familiarity with Python programming
Other Skills Used in the Project -
Acceptable Programming Languages Python

 

Project 4: "Magnetic field growth via Kelvin-Helmholtz Instability in Binary Neutron Star Mergers" with Dr Loren E Held

Project Title Magnetic field growth via Kelvin-Helmholtz Instability in Binary Neutron Star Mergers
Keywords astrophysics, computational fluid dynamics, binary neutron star mergers
Subject Area Astrophysics, Fluid and Solid Mechanics
Contact Name Dr Loren E Held
Contact Email leh50@cam.ac.uk
Department DAMTP
Group Astrophysical Fluid Dynamics
Project Duration 8 weeks
Background Information Neutron star mergers lead to some of the brightest events in the universe and are the main targets of the growing field of multi-messenger astronomy. During a merger two neutron stars in a binary collide, leading to the formation of an accretion disk surrounding a black hole or hyper-massive neutron star (HMNS) remnant. Magnetic fields play an important role during the merger: they power jets and winds, which are related to observational signatures such as short gamma ray bursts, and which drive heavy-element nucleosynthesis (one of the only ways of forming gold!). The growth and saturation of magnetic fields in mergers remains a major open question, and different magnetic field amplification mechanisms, including the Kelvin-Helmholtz Instability (KHI), are likely at play at different stages during the merger.
Project Description

This project takes a relatively simple fluid instability (the Kelvin Helmholtz Instability, KHI), such as would normally have been covered in a 2nd or 3rd year undergraduate course on fluid mechanics, and examines it in an exotic context -- that of a binary neutron star merger. The student(s) will study how the KHI can act as a mechanism for magnetic field amplification in the early stages of a neutron star merger. Within the first few milliseconds of impact, the merger of the neutron stars results in a linear shear layer which becomes unstable to the KHI. This is known to create vortices that can concentrate magnetic field. But how effectively can KHI vortices grow a magnetic field, and does the instability have time to saturate over the extremely short (ms) timescales involved? To answer these key questions, the student(s) will carry out 2D (magneto-)hydrodynamic (MHD) simulations of the KHI in special relativity using the astrophysical fluid dynamics code PLUTO. It is expected that the simulations will be run on a local cluster at DAMTP.

The project can be done individually or in a group, with different students working on different aspects of the problem (e.g. one student could carry out isothermal simulations to focus on the dynamics, while the other could focus on observational signatures of the instability by including thermodynamics and radiation, or by developing an analytical theory of relativistic MHD KHI to complement the simulations). Throughout the course of the project the student(s) will learn the basic physics of binary neutron star mergers. In addition the student(s) will also gain experience in computational fluid dynamics (in particular how to set-up and use one of the most popular open-source codes in astrophysical fluid dynamics, the PLUTO code), high performance computing (i.e. how to run simulations in parallel on a supercomputer), coding (particularly in C and Python), and data analysis. No background knowledge on mergers is necessary, though some background in fluid mechanics and astrophysics, and some familiarity with programming would be helpful.

Note that the topic is flexible and can be adjusted depending on the interests of the student(s). Alternative projects, within the confines of computational astrophysical fluid dynamics and accretion disks, are also possible.

Work Environment Student is expected to work with the supervisor. We will have weekly meetings. 
References Mignone A., Bodo G., Massaglia S., Matsakos T., Tesileany O., Zanni C., Ferrari A., 2007, ApJS, 170, 228, PLUTO: a numerical code for computational astrophysics (https://iopscience.iop.org/article/10.1086/513316);
Fernández, R. and Metzger, B.D., 2016. Electromagnetic signatures of neutron star mergers in the advanced LIGO era. ARNPS, 66(1), pp.23-45. (https://www.annualreviews.org/content/journals/10.1146/annurev-nucl-102115-044819);
Bucciantini, N. and Del Zanna, L., 2006. Local Kelvin-Helmholtz instability and synchrotron modulation in Pulsar wind nebulae. A&A, 454(2), pp.393-400 (https://www.aanda.org/articles/aa/abs/2006/29/aa4491-05/aa4491-05.html);
Ferrari, A., Trussoni, E. and Zaninetti, L., 1980. Magnetohydrodynamic Kelvin–Helmholtz instabilities in astrophysics–I. Relativistic flows–plane boundary layer in vortex sheet approximation. MNRAS, 193(3), pp.469-486 (https://academic.oup.com/mnras/article/193/3/469/995127)
Prerequisite Skills Fluids
Other Skills Used in the Project Some experience in astrophysics and fluid simulations would be helpful, but is not strictly necessary
Acceptable Programming Languages Python, C, No preference,

 

Project 5: "Information-theoretic inequalities for theoretical statistics and computer science" with Professor Varun Jog

Project Title Information-theoretic inequalities for theoretical statistics and computer science
Subject Area Statistics, Machine Learning, Information theory
Contact Name Varun Jog
Contact Email vj270@cam.ac.uk
Department DPMMS
Group Information Theory and Statistics
Duration 8 weeks
Background Information In statistics and information theory, various notions of "divergences" such as the Kullback--Leibler divergence, Hellinger divergence, total-variation divergence, etc. provide ways to measure how different two probability measures are. These divergences characterise the difficulty of solving hypothesis testing between p vs q in terms of best possible error, or the necessary sample-size, or asymptotic error rates, etc. These hypothesis testing results are in turn applied to prove lower bounds in theoretical statistics (such as Fano's or Le Cam's methods). The topic of divergence inequalities studies how these different divergences are related to each other. For example, a well-known inequality is Pinsker's inequality that upper bounds the total variation divergence in terms of the Kullback--Leibler divergence and is widely used in theoretical analyses. Broadly, the reason why these inequalities end up being so important is that often we want to say something about divergence A in a problem of interest, but divergence A is too messy to deal with, so we instead use a divergence B and say something about B, and then use a divergence-inequality to translate our statement for B into a statement for A.
Project Description

This project will look at some problems revolving around divergence-based inequalities motivated by specific applications. One concrete idea is to look at the interesting "reverse-Pinsker's inequality" from this paper: https://arxiv.org/pdf/2201.04735. This inequality underpins the powerful conclusion of this paper that some known "really hard to compute" problems can, in fact, be solved efficiently. Unfortunately, the reverse-Pinsker's inequality relies on an assumption (called "observability" in the paper) which is hard to make sense of, or even verify efficiently computationally. My hope is that one can establish a reverse-Pinsker under a more natural assumption that is easy to check. The project would then involve analysing simple examples or doing simulations to check if the new proposed inequality has any hope of being true (I think this is the case), and then guess and prove the right form of the inequality.

Work Environment The student will have weekly meetings with me (lasting for about an hour). In case there are two interns, it will be a joint meeting. Students are free to choose where they want to work and there are no expectations aside from attending the weekly meetings in-person.
References

Prerequisite Skills

Good understanding of undergraduate probability and analysis

Other skills used in the project

Coding simple programs to test conjectured mathematical inequalities in Python or MatLab (coding with AI assistance is fine too). Knowledge of latex for typesetting is also necessary.
Acceptable Programming Languages  Python or MatLab

 

Project 6: "Unifying patterns in algebraic geometry" with Dr Fatemeh Rezaee

Project Title Unifying patterns in algebraic geometry
Keywords Hilbert scheme, combinatorics, auto-conjecturing, Machine Learning
Subject Area Algebraic Geometry, Algebra, Combinatorics, Number Theory, Applied and Computational Analysis, Machine Learning
Contact Name Fatemeh Rezaee
Contact Email fr414@cam.ac.uk
Department DPMMS
Group Algebraic Geometry
Project Duration 8 weeks
Background Information In some counting problems in algebraic geometry, one can spot partial patterns and potentially give closed formulas. This summer project aims to work on concrete examples of this type and unify the partial patterns and partial closed formulas.
Project Description

In this project, we are primarily interested in finding patterns in specific integer sequences for which we already have some partial formulas. For example, we consider the sequences of dimensions of the tangent spaces at the singular points of specific Hilbert schemes, which are essential in sheaf-counting enumerative geometry.

There are two potential directions:

  1. Using ML to unify the partial patterns: in this case, one approach is to use the method introduced by Mishra, Moulik, and Sarkar and to realise the partial formulas in their Conjecture Space.
  2. A more theoretical direction is to combinatorially prove some relevant conjectures and use them to unify the formulas without ML.

Since the theoretical direction is expected to be mostly combinatorial, students with excellent combinatorial intuition or those with experience in Mathematics Olympiads/competitions are particularly encouraged to apply. I am seeking highly talented, motivated student(s), and, most importantly, someone committed to collaborative work, to work with me (and potentially an additional data scientist advisor, if the first direction is taken) on this project.

Work Environment There will be 1-2 students working with me as the primary supervisor. The student(s) are expected to work in the CMS at least 2-3 days a week. Potentially, I may invite a data scientist to assist with the project.
References C. Mishra, S. Moulik, and R. Sarkar. Mathematical conjecture generation using machine intelligence. arXiv:2306.07277, 2023.
F. Rezaee. Conjectural criteria for the most singular points of the Hilbert schemes of points. Experimental Mathematics, Vol. 34, no. 4. 2025 (https://www.tandfonline.com/doi/full/10.1080/10586458.2024.2400181)
Prerequisite Skills Algebra/Number Theory, Geometry/Topology
Other Skills Used in the Project Data Visualisation, Predictive Modelling
Acceptable Programming Languages Python, MATLAB, No preference
Additional Info If you are eligible to apply and interested in the project, please email me your CV and transcripts (and your DoS name for internal applicants) before finalising your application.

 

Project 7: "Learning the score" with Professor Richard Samworth

Project Title Learning the score
Keywords Score matching, diffusion models, distributional learning
Subject Area Statistics, Machine Learning
Contact Name Richard Samworth
Contact Email rjs57@cam.ac.uk
Department DPMMS
Group Statistical Laboratory
Project Duration 8 weeks
Background Information Distributional learning is an emerging area in statistics that aims to learn the full distribution of the data in a flexible manner. If the data are assumed to belong to a specific parametric family, this task reduces to standard maximum likelihood estimation. In contrast, estimating the distribution without such assumptions is substantially more challenging. Several strategies have been proposed in recent years, including score matching [1, 2] and energy-based models [3], allowing flexible, nonparametric estimation of complex distributions. Once a distribution is estimated, various statistical procedures can be improved on by leveraging this estimated distribution. For example recent work on a procedure called antitonic score matching [2] in the context of linear regression shows that the ordinary least squares estimator is improved on by mimicking a maximum likelihood estimator with the estimated (projected) score function. In other examples, statistical problems that are otherwise unidentifiable without knowledge of the full distribution may become tractable with a suitable distributional estimator [3]. Despite these advances, several interesting questions remain, including what the most effective strategies are for estimating the distribution in different scenarios.
Project Description This project offers a broad scope, encompassing potential methodological, theoretical, computational and applied contributions. One promising direction is to study and compare different distributional learning strategies in the context of specific estimation problems. For instance, one could investigate the properties and performance of various score estimators in different contexts, including diffusion models.
Work Environment This project will be jointly supervised with my post-doc, Elliot Young. I anticipate that the student will meet with Elliot and me once a week; they may meet with Elliot at other times. I generally find it good practice for students to work in the CMS during normal office hours, though some remote working is fine. I have an in-person meeting at 9am each Monday with my group, and an extended group meeting at 3pm on Tuesdays. Students may wish to participate in the Statistics Clinic, where once a fortnight anyone in the university can receive free statistical advice, e.g. by sitting in on some consultations to have an experience of applied problems.
References [1] Hyvärinen, A. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(24), 695–709.
[2] Feng, O. Y., Kao, Y.-C., Xu, M., & Samworth, R. J. (2026+). Optimal convex M-estimation via score matching. Annals of Statistics, to appear. arXiv. https://arxiv.org/abs/2403.16688.
[3] Shen, X., & Meinshausen, N. (2025). Engression: Extrapolation through the lens of distributional regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 87(3), 653–677.
Prerequisite Skills Statistics, Mathematical Analysis
Other Skills Used in the Project Simulation
Acceptable Programming Languages Python, R

 


Self-Proposed Project

If you would like to apply with a “self-proposed” project, you will need to secure a willing and available supervisor and agree on a project title and description with them in advance of the application deadline.  

Do not submit a “self-proposed” project for which you have not secured a supervisor.  

 

If you select “self-proposed” project in the application form, you will be asked to supply the following information: 

  • Project Title
  • Project Description 
  • Project supervisor

 

Identifying potential supervisors to contact: 

Consult this list of potential supervisors for undergraduate summer research projects.  The list is organised by research area.  Identify subject areas you are interested in and take a look at the webpages of the supervisors listed in that category to find out about their specific research interests.  Once you have identified a few potential supervisors that you'd like to reach out to, you can contact them via email to make an enquiry about a possible project.  Please disregard any specific projects that you find included in the list unless they specifically appear above.  If they do not also appear in the list above, they are not open to applications from external students. 

 

What to include when initially contacting potential supervisors:

Tell them about yourself:

  • Who are you? Where are you currently studying? What are you currently studying? 
  • What areas of mathematics are you interested in?
  • What skills do you have that would make you a good fit for a project in those subject areas?  
  • You don't need to propose your own project, but you may put forward ideas if you have them. A “self-proposed” project should be the result of a discussion between you and the prospective supervisor. 
  • We recommend that you attach your CV as well as a transcript (if available).

Tell them about the programme:

  • Please cc. visitingstudents@maths.cam.ac.uk when writing to a potential supervisor for the first time.
  • Make it clear that you are planning to apply for the Philippa Fawcett Internships and / or the Cambridge Mathematics Open Internships

Ask them: 

  • If they are available to supervise a project in the summer between Monday 6 July to Friday 28 August 2026
  • If they could suggest a suitable research project within their subject area.