
2026 Industrial CMP Projects
Below you will find the list of industrial CMP projects hosted by external companies (jump to list). Click here to see the list of academic projects hosted by other departments and labs within the university.
New projects may be added so check back regularly!
How to Apply
Unless alternative instructions are given in the project listing, to apply for a project you should send your CV to the contact provided along with a covering email which explains why you are interested in the project and why you think you would be a good fit.
Need help preparing a CV or advice on how to write a good covering email?
The Careers Service are there to help! Their CV and applications guides are packed full of top tips and example CVs.
Looking for advice on applying for CMP projects specifically? Check out this advice from CMP Co-Founder and Cambridge Maths Alumnus James Bridgwater.
Remember: it’s better to put the work into making fewer but stronger applications tailored to a specific project than firing off a very generic application for all projects – you won’t stand out with the latter approach!
Please note that to participate in the CMP programme you must be a student in Part IB, Part II, or Part III of the Mathematical Tripos at Cambridge.
Want to know more about a project before you apply?
Come along to the CMP Lunchtime Seminar Series in February 2026 to hear the hosts give a short presentation about their project. There will be an opportunity afterwards for you to chat informally with hosts about their projects.
Alternatively (or as well!), you can reach out to the contact given in the project listing to ask questions.
Industrial CMP Project Proposals for Summer 2026
- Novo Nordisk Research Centre Oxford, Exploring Gene Embeddings for Biological Analysis
- Novo Nordisk Research Centre Oxford, Virtual Cells with Large Language Models
- Nomura International Plc, Fragmented Order-book Content Assessment and Liquidity-weighting (FOCAL)
- Nomura International Plc, Option pricing with quantum information
- APEX Horticulture, Optimising the testing and selection process of cut flowers using historic performance and quality data
- G-Research, Agentic AI for Formalized Math
- MM Flowers, Correlation Between Forecast Accuracy, Stock Dwell Time and Retailer Waste on Customer Complaints: A Study of Yellow and White 40cm & 50cm Roses at MM Flowers
- Emcore Asset Managment, Implied Volatility Surface Construction, Diagnostics, and Decomposition
- Signaloid, Discrete Representations of Multivariate Continuous Probability Distributions
- Unilever SERS, Exploring deep learning embeddings for chemical bioactivity prediction
- Boehringer Ingelheim Limited, Foundation models for cancer biology
Exploring Gene Embeddings for Biological Analysis
| Project Title | Exploring Gene Embeddings for Biological Analysis |
| Keywords | Gene embeddings, networks, large language models, perturbation assays, gene interactions |
| Project Listed | 9 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | Marie Lisandra Zepeda Mendoza |
| Contact Name | Marie Lisnadra Zepeda Mendoza |
| Contact Email | vmnz@novonordisk.com |
| Company/Lab/Department | Novo Nordisk Research Centre Oxford |
| Address | Old Road Campus, Roosevelt Drive, Oxford, OX3 7FZ |
| Project Duration | 8 weeks, full time |
| Project Open to | Masters students (Part III) |
| Background Information |
Genes are the basic units of heredity and encode the information for the synthesis of proteins and other molecules that perform various functions in living organisms. Understanding the relationships between genes and their functions is a fundamental challenge in biology and medicine. One way to approach this challenge is to represent genes as numerical vectors, also known as embeddings, that capture some aspects of their biological properties and interactions. Embeddings can be derived from various sources of data, such as gene sequences, gene expression, gene ontology, protein-protein interactions, and literature. Embeddings can then be used for various tasks, such as gene clustering, gene function prediction, gene-disease association, and gene pathway analysis. The project is part of a broader and strategically critic goal for AI applied to computational biology in R&D in Novo Nordisk, which relates to the use, validation and control of gene embeddings for novel target and biomarker discovery and functional contextualization. |
| Project Description |
Aim
Methodology
Expected Outcomes
The implications of this project are:
|
| References | Soman, Karthik, et al. "Biomedical knowledge graph-enhanced prompt generation for large language models." arXiv preprint arXiv:2311.17330 (2023). Chen YT, Zou J. GenePT: A Simple But Hard-to-Beat Foundation Model for Genes and Cells Built From ChatGPT. bioRxiv [Preprint]. 2023, https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC10614824/ Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque. https://www.nature.com/articles/s41467-022-33026-0 In silico biological discovery with large perturbation models https://www.nature.com/articles/s43588-025-00870-1 |
| Work Environment | The student will work closely with the supervisor and will be able to interact with other colleagues of the bioAI Department, both in the Oxford as well as in the London site. We are a fully computational team, but in the Oxford site we have also various in vitro expert teams, to which the student can also be exposed to. If the student wishes to work in a hybrid mode, it is fine with the supervisor. |
| Prerequisite Skills | Statistics, Image processing, Geometry / Topology, Mathematical analysis, Simulation, Predictive Modelling, Database queries, Data Visualisation, Probability / Markov Chains |
| Other skills used in the Project | Statistics, Probability / Markov Chains, Image processing, Mathematical analysis, Geometry / Topology, Simulation, Predictive Modelling, Database queries, Data Visualisation |
| Acceptable Programming Languages | Python, R |
| Additional Requirements | Enthusiasm for biological applications of maths and a lot of willingness to learn Good communication and presentation skills are also desirable. |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Virtual Cells with Large Language Models
| Project Title | Virtual Cells with Large Language Models |
| Keywords | Virtual Cell, LLMs, Causality, In-contex learning |
| Project Listed | 9 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | Marc Boubnovski Martell and Josefa Stoisser |
| Contact Name | Josefa Stoisser |
| Contact Email | ofsr@novonordisk.com |
| Company/Lab/Department | Novo Nordisk, BioAI team |
| Address | Novo Nordisk R&D Digital Hub, Pancreas Rd, London, N1C 4AG, UK |
| Project Duration | 8-10 weeks, full-time |
| Project Open to | Masters students (Part III) |
| Background Information |
A virtual cell is an in-silico model (a kind of “digital twin”) that lets us predict how a living cell will respond to interventions (e.g. adding a drug). This sits in a fast-moving area of BioAI, with community benchmarks such as the Virtual Cell Challenge [1] pushing models toward realistic generalisation settings. At the core of the virtual cell is biological perturbation prediction: given a baseline cell state, predict how the cell changes after an intervention (e.g., knocking out a gene). Conceptually, this is a causal effect problem, made hard by biological confounding, incomplete measurements, and distribution shift (new cell types, new perturbations, new experimental settings). Our recent work (“LangPert”, ICLR 2025 workshop spotlight [2]) suggests a practical path forward: use LLMs to retrieve and synthesise mechanistic biological context (gene function, pathways, interactions, etc.) and condition predictive models on that context. The key benefit is zero-shot or low-data generalisation to perturbations the model has not seen during training, while also producing explanations that are at least partially aligned with known biology. In parallel, LLM-powered causal analysis motivates a causality-first approach to virtual cells [3, 4]. |
| Project Description |
The project will explore LLM-informed causal modelling for perturbation prediction. The high-level aim is to use LLM knowledge as a contextual guidance—not as a replacement for data—so that models can better infer gene–gene relationships and predict outcomes of interventions, especially when faced with novel perturbations or shifted experimental conditions.
Because the AI landscape changes quickly the specific LLM, retrieval approach, and causal estimator will be updated at the project start to reflect the best available options. Successful outcome:
|
| References | [1] Virtual Cell Challenge. https://virtualcellchallenge.org/. [2] Märtens, K., Boubnovski Martell, M., Prada-Medina, C. A., & Donovan-Maiye, R. (2025). LangPert: LLM-Driven Contextual Synthesis for Unseen Perturbation Prediction. MLGenX Workshop at ICLR 2025 (Oral). https://openreview.net/forum?id=Tmx4o3Jg55. [3] Wang, X., Zhou, K., Wu, W., Singh, H. S., Nan, F., Jin, S., Philip, A., Patnaik, S., Zhu, H., Singh, S., Prashant, P., Shen, Q., & Huang, B. (2025). Causal-Copilot: An Autonomous Causal Analysis Agent. arXiv:2504.13263 [cs.AI]. https://doi.org/10.48550/arXiv.2504.13263. [4] Kıcıman, E., Ness, R. O., Sharma, A., & Tan, C. (2024). Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. Transactions on Machine Learning Research (TMLR). https://doi.org/10.48550/arXiv.2305.00050 [5] Robertson, J., Reuter, A., Guo, S., Hollmann, N., Hutter, F., & Schölkopf, B. (2025). Do-PFN: In-Context Learning for Causal Effect Estimation. NeurIPS 2025. https://doi.org/10.48550/arXiv.2506.06039. |
| Work Environment | The student will join the BioAI team at Novo Nordisk, supervised by Josefa Stoisser and Marc Boubnovski Martell, with co-supervision from Jialin Yu (University of Oxford). The BioAI team develops AI/LLM/agentic systems for drug discovery and has a publication track record in top-tier AI venues (NeurIPS, ACL, ICML, ICLR). The office is in King’s Cross, London. Remote work is possible, but 2–3 days per week on-site is preferred. |
| Prerequisite Skills | Statistics, Mathematical analysis |
| Other skills used in the Project | LLMs, Causal Inference |
| Acceptable Programming Languages | Python |
| Additional Requirements | - |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Fragmented Order-book Content Assessment and Liquidity-weighting (FOCAL)
| Project Title | Fragmented Order-book Content Assessment and Liquidity-weighting (FOCAL) |
| Keywords | FX Markets, limit order book, microprice |
| Project Listed | 9 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | Jan Novotny |
| Contact Name | Jan Novotny |
| Contact Email | jan.novotny@nomura.com |
| Company/Lab/Department | Nomura International Plc |
| Address | 1 Angel Ln, London EC4R 3AB |
| Project Duration | 8-10 weeks full time |
| Project Open to | Masters students (Part III) |
| Background Information | Foreign exchange markets present a particularly compelling use case for this research, as they represent the world's largest and most liquid financial market with daily trading volumes exceeding $7 trillion. Unlike centralized exchange-traded assets, FX markets operate as a decentralized, over-the-counter network where liquidity is highly fragmented across numerous market makers, electronic communication networks (ECNs), and trading platforms. This fragmentation creates significant challenges for price discovery, as there is no single consolidated order book or official exchange rate at any given moment. The decentralized nature of FX trading means that different liquidity providers may quote varying prices simultaneously, making the aggregation and analysis of order book information both more complex and more valuable for understanding true market conditions. Given the enormous scale and fragmented structure of FX markets, developing robust methods to classify information content across multiple liquidity pools could yield substantial improvements in price discovery, execution quality, and market efficiency. |
| Project Description |
Primary Objectives:
Secondary Objectives:
Information Content Metrics:
Validation Framework:
Applications
|
| References | n/a |
| Work Environment | The internship will be in person in office (hybrid model possible), candidate will be closely working together with the team. |
| Prerequisite Skills | Statistics |
| Other skills used in the Project | Predictive Modelling, Simulation |
| Acceptable Programming Languages | Python, kdb+/q |
| Additional Requirements | Enthusiasm to learn on the real case study |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Option pricing with quantum information
| Project Title | Option pricing with quantum information |
| Keywords | Option pricing, quantum information |
| Project Listed | 9 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | Paul McCloud |
| Contact Name | Paul McCloud |
| Contact Email | paul.mccloud@nomura.com |
| Company/Lab/Department | Nomura |
| Address | 1 Angel Lane, London EC4R 3AB |
| Project Duration | 8 weeks |
| Project Open to | Masters students (Part III), Third year undergraduates (Part II) |
| Background Information | Nomura is a global financial services group with an integrated network spanning over 30 countries. The Quantitative Research team supports Global Markets businesses by developing mathematical models for the pricing and risk management of derivative trades, in close partnership with the trading desks. The role requires an exceptional level of technical quantitative skills, ideally backed up by mathematical research experience (not necessarily related to finance). |
| Project Description | Option pricing is the most elementary challenge of derivative modelling and is the foundation for many of the solutions needed by a Global Markets structured products business. Traditional methods employ classical stochastic calculus, but this approach can struggle when applied with complex boundary conditions, which potentially limits the product offering of the business. This project explores numerical methods for option pricing established on noncommutative information, to see if the novel degrees of freedom this introduces can facilitate more efficient schemes or generate better convergence and fitting to options markets. Abstracted as a pure mathematical challenge, the project considers the application of results from noncommutative algebra to well-posed problems whose solutions can be mapped onto option pricing. |
| References | [1] McCloud, P. “Quantum bounds for option pricing” (2018) arxiv.org/abs/1712.01385 [2] McCloud, P. “Information and arbitrage: applications of quantum groups in mathematical finance” (2024) arxiv.org/abs/1711.07279 [3] McCloud, P. “The relative entropy of expectation and price” (2025) arxiv.org/abs/2502.08613 |
| Work Environment | You will research the project remotely, supported by a supervisor at Nomura and with occasional visits to the Nomura London office for progress updates. |
| Prerequisite Skills | Mathematical Physics, Algebra / Number theory, Mathematical analysis |
| Other skills used in the Project | Numerical Analysis, Partial Differential Equations, Probability / Markov Chains |
| Acceptable Programming Languages | Python, MATLAB |
| Additional Requirements | Curiosity and a willingness to apply ideas in novel contexts |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Optimising the testing and selection process of cut flowers using historic performance and quality data
| Project Title | Optimising the testing and selection process of cut flowers using historic performance and quality data |
| Keywords | Horticulture, varietal development, supply chain |
| Project Listed | 9 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisors | Lauren Hibbert and Richard Boyle |
| Contact Name | Lauren Hibbert |
| Contact Email | lauren.hibbert@apexhorticulture.com |
| Company/Lab/Department | APEX Horticulture |
| Address | Pierson Road, The Enterprise Campus, Alconbury Weald, PE284YA |
| Project Duration | 8 weeks |
| Project Open to | Masters students (Part III), Third year undergraduates (Part II) |
| Background Information |
APEX Horticulture Ltd. is a professional research and development business, offering bespoke testing services for cut flowers and plants. APEX has three purpose-built testing centres in the UK and US. APEX is a division in the wider MM group, where the primary business, MM Flowers, is one of the UK’s leading cut flower importer/processing companies, with vertically integrated ownership model and innovative practices. More recently, the MM group has diversified its activities, including supplying plants, bulbs and other gifting products to the retailers in the UK and Europe. MM is owned by the AM Fresh Group, a leading breeder, grower and distributor of citrus and grapes; Vegpro, East Africa’s largest flower and vegetable producer; and Elite, based in South America, and the leading flower grower globally. APEX is at the optimal position in the chain, able to deliver high quality, independent research and close-to-market proximity matched with the invaluable insight into the true performance of flowers and plants subjected to actual supply chain conditions. The infrastructure and specialised personnel of APEX aims to deliver robust, standardised and consistent research every week of the year, together with the ability to undertake large scale projects to match all client requirements, influencing all elements of the cut flower supply chain. APEX undertakes many different research projects covering the entire supply chain, from development of new flower types through to the manufacturing requirements for the final bouquets. Each of these projects generates a significant amount of data and insight, which is used to provide recommendations to the various stakeholders of each project. |
| Project Description | APEX tests over 50k cut flower samples annually, with around 30-60 data points generated per sample. Whilst this is often focussed on certain crop types, such as roses and lilies, many more types of flowers are tested across many different projects. The data generated includes agronomic and freight data, through to performance data associated with sample longevity (‘vase’/’shelf’ life) and aesthetic appeal. Several of the projects undertaken by APEX are long term with key strategic stakeholders, which allows for an assessment of flower performance and quality over many months and years. Each sample often has significant background information, including the type of flower, the growing location and agronomic practices, and the freight mode, for example. There are many influencing factors that can impact the above, such as weather conditions, freight delays and handling through the supply chain, which can often result in variability across a testing programme. Whilst APEX will design projects to try and account for this potential variation, there is a desire to use existing data to improve the efficiency and accuracy of the testing process. Selecting flower types and cultivars that do not meet the required standards can result in significant waste, consumer dissatisfaction and potentially brand damage, and therefore having the best insight possible reduces the risk of this. This clearly has implications across the supply chain, from the breeder/grower through to the suppliers and retailers. Therefore, can existing datasets be used to determine an appropriate model for assessing the viability of cut flowers (such as a new flower type, cultivar or treatment, for example), albeit more effectively and efficiently than the current process. |
| References | - |
| Work Environment | Student will be part of a wider team, but will be leading the project. Working pattern can also be hybrid (and largely remote) |
| Prerequisite Skills | Statistics, Mathematical analysis |
| Other skills used in the Project | Statistics |
| Acceptable Programming Languages | No preference |
| Additional Information | Desire to operate in a commercial business, and provide insights that can inform real world decisions. |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Agentic AI for Formalized Math
| Project Title | Agentic AI for Formalized Math |
| Keywords | AI Lean 4 Agents LLM |
| Project Listed | 9 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | Nehal Patel and Charles Martinez |
| Contact Name | Nehal Patel |
| Contact Email | nehal.patel@gresearch.co.uk |
| Company/Lab/Department | G-Research |
| Address | 1 Soho Pl, London W1D 3BG |
| Project Duration | Flexible, 8-12 week duration, summer of of 2026, |
| Project Open to | Masters students (Part III), Third year undergraduates (Part II), Second year undergraduates (Part IB) |
| Background Information | AI agents and interactive theorems provers have the potential to forever change the way mathematics is done. This project provides students with a hands on opportunity to learn and apply these tools in their area of research. |
| Project Description | Students will formalize, in Lean 4, a topic of their choosing using agentic AI techniques. The initial toolset for AI theorem proving will be provided and students will have the opportunity to help shape the improvement of these tools. Depending on the student's interest, work may either focus primarily on formalization or may include working on the agentic theorem proving framework. Caveats: Not all branches of math are easy to model in Lean 4. Prior experience with Lean 4 is advisable. Prior experience with AI & LLMs not required, but helpful. Prior experience with programming and a hacker ethos are also highly desirable. |
| References |
Introductions to Lean: Agentic Theorem Prover (One of Many): |
| Work Environment | Work will be directed primarily from GR staff based in Boston. Student will work mostly independently and remotely, coordinating work via Github and video meetings (with a meeting cadence that will adapt as the project progresses). Twice during the summer, Boston staff will be present in England and will arrange 1-3 day intensive sessions with student for joint collaboration. |
| Prerequisite Skills | Formal Math in Lean 4 |
| Other skills used in the Project | App Building |
| Acceptable Programming Languages | Python, Lean 4 |
| Additional Requirements | Candidates should be prepared to propose some mathematics that they would like to formalize using AI tools in Lean. This could draw from their current research focus or their general interests. Topics from recreational math or applied topics are acceptable. Students are encouraged to investigate to what extent necessary background theories have already been formalized. |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Correlation Between Forecast Accuracy, Stock Dwell Time and Retailer Waste on Customer Complaints: A Study of Yellow and White 40cm & 50cm Roses at MM Flowers
| Project Title | Correlation Between Forecast Accuracy, Stock Dwell Time and Retailer Waste on Customer Complaints: A Study of Yellow and White 40cm & 50cm Roses at MM Flowers |
| Keywords | Forecast Accuracy; Stock Dwell Time; Retailer Waste; Customer Complaints; Statistical Analysis |
| Project Listed | 16 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | Ellanette van Zyl |
| Contact Name | Ellanette van Zyl |
| Contact Email | Ellanette.vanzyl@mm-flowers.com |
| Company/Lab/Department | MM Flowers |
| Address | Pierson Road, The Enterprise Campus, Alconbury Weald, Huntingdon, PE28 4YA |
| Project Duration | 8 weeks, full-time 40 hours/week |
| Project Open to | Masters students (Part III), Third year undergraduates (Part II) |
| Background Information |
MM Flowers operates in a highly time-sensitive fresh flower supply chain, where product quality, availability and freshness are critical drivers of customer satisfaction. Even small inaccuracies in forecasting or delays in product movement can result in extended stock dwell time, increased waste at retailer level and ultimately higher customer complaint volumes. Roses, in particular, represent a high-volume and high-visibility product category where performance variability can have a significant commercial and reputational impact. Forecast accuracy directly influences ordering decisions, inbound volumes and stock allocation. When actual arrivals deviate from forecasted volumes, this can lead to either stock shortages, impacting service levels, or excess stock levels, increasing dwell time and the risk of quality deterioration. Longer dwell times at MM Flowers or retail stores can accelerate deterioration, contribute to retailer waste and negatively affect the end consumer experience. This project focuses specifically on yellow and white 40cm and 50cm roses, which are core SKUs within the MM Flowers portfolio and are particularly sensitive to demand variability and shelf-life constraints. By analysing the relationships between forecast accuracy, stock dwell time, retailer waste and customer complaints for these products, the project aims to identify whether measurable correlations exist across the supply chain. The project is interesting and valuable because it connects operational planning decisions with downstream quality outcomes and customer feedback. Understanding these relationships will support more data-driven forecasting, stock management and waste-reduction strategies, while also providing insight into how supply chain performance ultimately affects customer satisfaction. The findings have the potential to inform targeted improvements for key rose lines and contribute to broader continuous improvement initiatives within MM Flowers. |
| Project Description |
This project will investigate whether measurable correlations exist between forecast accuracy, stock dwell time, retailer waste and customer complaints for yellow and white 40cm and 50cm roses supplied by MM Flowers. The project is primarily data-driven and quantitative, making it well suited to a student with strengths in mathematics, statistics or data analysis. The project will begin with a data familiarisation and definition phase, during which the student will work with historical supply chain data provided by MM Flowers. This will include forecast volumes, actual arrival quantities, stock dwell time metrics, retailer waste data and customer complaint records. The student will be responsible for defining and calculating appropriate performance measures, such as forecast accuracy metrics (e.g. absolute error or percentage error), dwell time distributions and waste rates. In the next phase, the student will apply statistical and mathematical techniques to explore relationships between variables. This may include:
The project is open-ended in nature, allowing findings from the initial analysis to guide deeper investigation. For example, if strong correlations are identified for certain rose lengths or colours, the student may focus further analysis on those segments or explore threshold effects where performance begins to deteriorate significantly. A successful outcome would be:
The project is interesting and useful because it links mathematical analysis directly to real-world operational and commercial outcomes. Students will gain experience applying statistical methods to complex, imperfect industry data, while MM Flowers will benefit from improved understanding of how planning and stock decisions affect product quality and customer experience across the supply chain. |
| References | https://mm-flowers.com |
| Work Environment |
The student will work independently on the core analytical aspects of the project, with regular guidance and supervision from the project lead at MM Flowers. The project will be based within a business and operational environment rather than a laboratory, giving the student exposure to real-world supply chain data and decision-making contexts. In addition to the primary supervisor, the student will have opportunities to engage with forecasting, supply chain planning and quality teams at MM Flowers, allowing them to discuss data definitions, operational processes and practical implications of their findings. While there is no formal academic research group on site, the student will be supported through regular check-ins and access to subject matter experts across the business. Working hours will be flexible, aligned with standard office hours, and can be adjusted to accommodate academic commitments. The project can be conducted in a hybrid format, combining remote analytical work with occasional on-site days at MM Flowers when beneficial for data access, collaboration and project reviews. Day-to-day work will primarily involve data analysis, modelling and interpretation, with time allocated for meetings, progress reviews and refinement of the analytical approach. The student will be encouraged to manage their own time, structure their analysis and propose next steps, mirroring the autonomy expected in both industry and academic research roles. This working environment offers a balance of independent mathematical problem-solving and practical business engagement, providing a supportive setting for a mathematics student to apply theoretical skills to a real operational challenge. |
| Prerequisite Skills | Statistics, Predictive Modelling, Data Visualisation, Database queries, Applied statistics, regression analysis, exploratory data analysis, and translating real-world problems into quantitative models. |
| Other skills used in the Project | Predictive Modelling, Statistics, Data Visualisation, Database queries, Simulation, Applied statistical analysis, handling real-world operational data, basic programming skills (e.g. Python or R), critical interpretation of quantitative results, and ability to communicate findings clearly to non-technical stakeholders. |
| Acceptable Programming Languages | Python, R, No preference, SQL |
| Additional Requirements | We are looking for a student who is curious, analytical and motivated to apply mathematical skills to real-world problems. The ideal candidate will demonstrate enthusiasm for data-driven analysis and a willingness to engage with complex, imperfect datasets typical of an operational business environment. Strong problem-solving ability, attention to detail and critical thinking are important, along with a willingness to question assumptions and explore findings independently. The student should be comfortable working autonomously while also being open to feedback and discussion. Good communication skills are essential, as the project will require explaining quantitative findings clearly to non-technical stakeholders and translating mathematical results into practical business insights. An interest in supply chain, forecasting or data analytics in an applied setting would be advantageous, but not essential. Above all, we value a positive attitude, intellectual curiosity and a willingness to learn, as the project offers scope for the student to shape the direction of the analysis based on their findings. |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Implied Volatility Surface Construction, Diagnostics, and Decomposition
| Project Title | Implied Volatility Surface Construction, Diagnostics, and Decomposition |
| Keywords | Implied Volatility Fitting, Surface Dynamics, Options, Options Pricing, Simulation, PCA, Risk Scenarios Generation, |
| Project Listed | 23 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | Antonio Zarrillo and Silvia Stanescu |
| Contact Name | Antonio Zarrillo |
| Contact Email | antonio.zarrillo@emcore.ch |
| Company/Lab/Department | Emcore Asset Managment |
| Address | Schochenmühlestrasse 6, 6340 Baar |
| Project Duration | 10-12 weeks from June |
| Project Open to | Third year undergraduates (Part II), Masters students (Part III), Second year undergraduates (Part IB) |
| Background Information | Options are quoted on a discrete grid of strikes and maturities, but pricing and risk processes require a continuous implied volatility surface. For any strike and expiry, an observed option premium can be mapped (under an agreed pricing convention) to an implied volatility level; the collection of these points defines the market surface. In practice, quotes are noisy and incomplete, bid/ask spreads can be wide, and naive interpolation can produce unstable outputs or static arbitrage across strike or maturity. A robust surface workflow therefore enables consistent valuation inputs for backtesting and research, improves the reliability of derived sensitivities and signals, and reduces operational and model risk. Beyond fitting a surface on a given day, an important question is how the surface evolves through time. Empirically, a large share of surface variation is low-dimensional and can often be interpreted as level, skew, and curvature-type moves. PCA provides a transparent factor decomposition of these movements and can be used to build coherent stress scenarios and statistical scenario generation for risk measures. |
| Project Description |
The project will build an end-to-end workflow that ingests option-chain quotes, produces a stable implied volatility representation across strike and maturity, and generates diagnostics for quality and stability. The student will implement a robust construction method, and assess sensitivity to data quality, filtering, and weighting choices. A PCA decomposition will then be built from a historical time series of model outputs evaluated on a fixed strike–maturity grid. Successful outcome:
|
| References | [1] Carr, P., & Madan, D. (2005). A note on sufficient conditions for no arbitrage. Finance Research Letters. [2] Cont, R., & da Fonseca, J. (2002). Dynamics of implied volatility surfaces. Quantitative Finance. [3] Cont, R., & Vuletić, M. (2023). Simulation of arbitrage-free implied volatility surfaces. Applied Mathematical Finance, 30(2), 94-121. [4] Gatheral, J. (2006). The Volatility Surface: A Practitioner's Guide. Wiley. [5] Gatheral, J., & Jacquier, A. (2014). Arbitrage-free SVI volatility surfaces. Quantitative Finance. [6] Skiadopoulos, G., Hodges, S., & Clewlow, L. (1999). The dynamics of the S&P 500 implied volatility surface. Review of Derivatives Research. [7] Zeliade Systems (2009). Quasi-Explicit Calibration of Gatheral's SVI model. White Paper (ZWP-0005). |
| Work Environment | The research project can be conducted in a hybrid format, with guidance from at least one team member. |
| Prerequisite Skills | Statistics, Numerical Analysis, Partial Differential Equations, Mathematical analysis, Simulation, Database queries, Data Visualisation |
| Other skills used in the Project | Probability / Markov Chains, Algebra / Number theory, Predictive Modelling |
| Acceptable Programming Languages | Python |
| Additional Requirements | Strong interest in learning from a practical case study |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Discrete Representations of Multivariate Continuous Probability Distributions
| Project Title | Discrete Representations of Multivariate Continuous Probability Distributions |
| Keywords | (Multivariate) Statistics, Probability/Markov Chains, Simulation, (Numerical) Linear Algebra |
| Project Listed | 26 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | TBC |
| Contact Name | Dr Michael Selby |
| Contact Email | careers@signaloid.com |
| Company/Lab/Department | Signaloid |
| Address | 4 Station Square, Cambridge, CB1 2GE |
| Project Duration | 8 weeks, full time |
| Project Open to | Masters students (Part III), Third year undergraduates (Part II) |
| Background Information | Probability distributions provide a mathematical framework for understanding and modelling uncertainty, allowing us to quantify the likelihood of different outcomes in random processes. By characterising how data is distributed, they enable informed decision-making and are foundational to fields like statistics, machine learning, and risk assessment. Many of these distributions, such as the famous normal distribution (bell curve), are defined continuously, but in reality we need to represent these distributions with a finite number of discrete points so that we may perform statistical tasks quickly and efficiently on a computer. |
| Project Description | In this project you will be working on new discrete representations of probability distributions to try and uncover better ways to capture the shape and form of many theoretical and real world distributions. First you will learn about distributions as a rigorous mathematical object and how you can perform arithmetic on them. You will also learn how we quantify the "closeness" of distributions using distance metrics and criteria. Then after researching and analyzing existing methods to represent distributions discretely, you will get to try and conceive of new and improved methods, especially for high-dimensional distributions. Finally, you will test, verify and analyze the underlying numerical linear algebra of these methods both analytically and numerically through simulations (in Python or a similar language). |
| References | https://signaloid.com/technology |
| Work Environment | Join a remote team of industry mathematicians discussing probability theory and real world statistical problems. You will have the chance to talk with your supervisor multiple times per week and have them guide you through the project and oversee your progress. |
| Prerequisite Skills | Statistics, Probability / Markov Chains, Simulation, (Numerical) Linear Algebra |
| Other skills used in the Project | Mathematical analysis, Data Visualisation |
| Acceptable Programming Languages | Python |
| Additional Requirements | None |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Exploring deep learning embeddings for chemical bioactivity prediction
| Project Title | Exploring deep learning embeddings for chemical bioactivity prediction |
| Keywords | Deep learning, Embeddings, Representation learning, Autoencoders, Toxicology |
| Project Listed | 26 January 2026 |
| Project Status | Open |
| Application Deadline | 27 February 2026 |
| Project Supervisor | Patrik Engi and Hugh Barlow |
| Contact Name | Patrik Engi |
| Contact Email | patrik.engi@unilever.com |
| Company/Lab/Department | Unilever SERS |
| Address | Colworth Science Park, Sharnbrook, Bedford, MK44 1LQ |
| Project Duration | 8-12 weeks |
| Project Open to | Masters students (Part III), Third year undergraduates (Part II) |
| Background Information |
In a fast-moving consumer goods environment, it is vital that safety assessments are conducted to ensure products are safe for humans and the environment. Historically, these assessments have required in vivo animal testing and so there is a pressing ethical and scientific need to develop of non-animal methods to support product safety risk assessment. For more than 20 years, Unilever’s Safety, Environmental and Regulatory Science (SERS) Group has been developing novel in silico and in vitro methods, which leverage recent advances in biology, genetics, computing, mathematics and statistics, to conduct safety assessments without the use of animal testing. [1, 2]. The current evolution in the risk assessment paradigm, presents new opportunities in terms of applying new deep learning and AI-based approaches. A key part of risk assessment is to characterise the potential effects that a chemical may have on different cell types, which would typically involve using high throughput transcriptomics (HTTr) to measure the transcriptional response of cells to different concentrations of a test chemical. Such data can be expensive to generate, particularly if it needs to be generated for multiple chemicals and cell types. Therefore, the use of data-driven modelling which maximises the utility of all the available data is a high priority to achieve cost effective implementation of non-animal approaches. Many approaches in risk assessment of an unknown compound are based on determining the similarity to a known compound. Recent advances in deep learning have introduced powerful methods for generating embeddings, numerical representations that capture complex relationships between entities. By combining transcriptional and chemical information (such as structural representations like molecular fingerprints), these embeddings may provide valuable insights beyond traditional similarity metrics, even being able to predict responses for unseen chemicals [3, 4]. |
| Project Description |
Embeddings map complex (high-dimensional) data into a simplified (lower-dimensional) latent space, while preserving semantic relationships. Thereby enabling the discovery of relationships between datasets previously masked by the high dimensionality. Internal studies of these methods for biochemical in vitro responses have yielded promising results which we seek to further apply to relevant datasets. For this project, the student(s) should start by familiarising themselves with literature surrounding this topic, upskilling around the use of representation learning /embedding models, transcriptomics and toxicology – with support from SERS experts. With this foundation, we recommend that the student progress this work by choosing one or more of the following paths;
|
| References | [1] J. Reynolds, S. Malcomber and A. White, “A Bayesian approach for inferring global points of departure from transcriptomics data,” Computational Toxicology, vol. 16, p. 100138, November 2020. [2] T. E. Moxon, H. Li, M.-Y. Lee, P. Piechota, B. Nicol, J. Pickles, R. Pendlington, I. Sorrell and M. T. Baltazar, “Application of physiologically based kinetic (PBK) modelling in the next generation risk assessment of dermally applied consumer products,” Toxicology in Vitro, vol. 63, p. 104746, March 2020. [3] Kang, B., Fan, R., Yi, M., Cui, C. and Cui, Q., 2025. A large-scale foundation model for bulk transcriptomes. bioRxiv, pp.2025-06. [4] Yoni Donner, Stéphane Kazmierczak, and Kristen Fortney, Drug Repurposing Using Deep Embeddings of Gene Expression Profiles Molecular Pharmaceutics 2018 15 (10), 4314-4325 |
| Work Environment | The student will be following on from the work of two supervisors who will be available to support throughout the project. The student(s) should gain experience in deep learning, applied scientific computing and bioinformatics, while also being able to meet and collaborate with experts from a variety of both mathematical and other backgrounds. As the project is hosted at a site near Bedford, we expect the student will mostly be working remotely. However, we encourage attendance in-person where travel permits. |
| Prerequisite Skills | Predictive Modelling, Data Visualisation, Deep learning |
| Other skills used in the Project | Statistics, Probability / Markov Chains, Mathematical Physics, Mathematical analysis, Algebra / Number theory |
| Acceptable Programming Languages | Python, R |
| Additional Requirements | We are seeking a proactive student who brings curiosity, clear communication, and a genuine drive to grow. Experience with training and evaluating models in pytorch/tensorflow is preferable, though not essential. |
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |
Foundation models for cancer biology
| Project Title | Foundation models for cancer biology |
| Keywords | neural networks, foundation models, life sciences, single cell transcriptomics, |
| Project Listed | 26 January 2026 |
| Project Status | Open |
| Application Deadline | 6 March 2026 |
| Project Supervisor | Nicola Richmond and Sebastian Burgstaller-Muehlbacher |
| Contact Name | Sebastian Burgstaller-Muehlbacher |
| Contact Email | sebastian.burgstaller-muehlbacher@boehringer-ingelheim.com |
| Company/Lab/Department | Boehringer Ingelheim Limited |
| Address | 1 Pancras Sq, London N1C 4AG |
| Project Duration | 8 weeks, full-time. |
| Project Open to | Masters students (Part III), Third year undergraduates (Part II) |
| Background Information | The Virtual Cell is an emerging concept which uses (mostly) foundation models (typically transformers) to model cell types and chemical or gene perturbations. This is enabled by large scale single cell transcriptomics (which genes are expressed in a cell) datasets now available to train such models. After training a foundation model, it can be finetuned to execute certain tasks, e.g. cell type identification. However, cells do not live alone, they exist in a tissue and organ context. Thus, it is of high scientific interest to understand whom the neighboring cell types are (cell niche) and how a certain cell talks to their neighbors (cell-cell communication). This is particularly important when trying to understand the tumor microenvironment in cancer. |
| Project Description |
Aims: Key Tasks:
What summer students will learn:
|
| References |
Single cell foundation model references: Niche detection: |
| Work Environment | The students will be supervised by technical and domain subject-matter experts that have PhD-level education and have post-doctoral research experience. The roles will be full-time and in the office, for 3 - 5 days a week, depending on the student preferences. We are located in a vibrant part of London with easy access to public transporation to and from Cambridge. |
| Prerequisite Skills | Statistics, Image processing, Geometry / Topology, Data Visualisation, Database queries |
| Other skills used in the Project | Statistics, Image processing, Geometry / Topology, Data Visualisation, Database queries, App Building |
| Acceptable Programming Languages | Python, Pytorch, JAX |
| Additional Requirements |
Required/useful skills:
|
| Application Instructions | Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit. |