skip to content

Summer Research Programmes

 

2026 Industrial CMP Projects

Below you will find the list of industrial CMP projects hosted by external companies (jump to list).  Click here to see the list of academic projects hosted by other departments and labs within the university.  

New projects may be added so check back regularly!

 

How to Apply

Unless alternative instructions are given in the project listing, to apply for a project you should send your CV to the contact provided along with a covering email which explains why you are interested in the project and why you think you would be a good fit.  

Need help preparing a CV or advice on how to write a good covering email? 

The Careers Service are there to help!  Their CV and applications guides are packed full of top tips and example CVs.  

Looking for advice on applying for CMP projects specifically?  Check out this advice from CMP Co-Founder and Cambridge Maths Alumnus James Bridgwater.  

Remember: it’s better to put the work into making fewer but stronger applications tailored to a specific project than firing off a very generic application for all projects – you won’t stand out with the latter approach!  

Please note that to participate in the CMP programme you must be a student in Part IB, Part II, or Part III of the Mathematical Tripos at Cambridge.  

 

Want to know more about a project before you apply? 

Come along to the CMP Lunchtime Seminar Series in February 2026 to hear the hosts give a short presentation about their project.  There will be an opportunity afterwards for you to chat informally with hosts about their projects. 

Alternatively (or as well!), you can reach out to the contact given in the project listing to ask questions. 

 


Industrial CMP Project Proposals for Summer 2026

 

Exploring Gene Embeddings for Biological Analysis

Project Title Exploring Gene Embeddings for Biological Analysis
Keywords Gene embeddings, networks, large language models, perturbation assays, gene interactions
Project Listed 9 January 2026
Project Status Open
Application Deadline 27 February 2026
Project Supervisor Marie Lisandra Zepeda Mendoza
Contact Name Marie Lisnadra Zepeda Mendoza
Contact Email vmnz@novonordisk.com
Company/Lab/Department Novo Nordisk Research Centre Oxford
Address Old Road Campus, Roosevelt Drive, Oxford, OX3 7FZ
Project Duration 8 weeks, full time
Project Open to Masters students (Part III)
Background Information

Genes are the basic units of heredity and encode the information for the synthesis of proteins and other molecules that perform various functions in living organisms. Understanding the relationships between genes and their functions is a fundamental challenge in biology and medicine. One way to approach this challenge is to represent genes as numerical vectors, also known as embeddings, that capture some aspects of their biological properties and interactions. Embeddings can be derived from various sources of data, such as gene sequences, gene expression, gene ontology, protein-protein interactions, and literature. Embeddings can then be used for various tasks, such as gene clustering, gene function prediction, gene-disease association, and gene pathway analysis.

The project is part of a broader and strategically critic goal for AI applied to computational biology in R&D in Novo Nordisk, which relates to the use, validation and control of gene embeddings for novel target and biomarker discovery and functional contextualization.

Project Description

Aim
The aim of this project is to explore how to define embeddings for genes and how they relate to each other, and to evaluate which embeddings are most useful and which data sources should be included. The specific objectives are:

  • To review the existing methods and tools for generating gene embeddings from different data sources
  • To compare and contrast different types of gene embeddings, such as sequence-based, expression-based, ontology-based, interaction-based, and literature-based. Of particular interest will be recent work on augmenting large language models with domain-specific tools such as database utilities for more precise access to specialized knowledge (e.g. GeneGPT),
  • To apply and test different gene embeddings on various biological analysis tasks, such as gene function prediction, gene-disease association gene pathway analysis and drug target prediction.

Methodology
The methodology of this project consists of the following steps:

  • To query existing public and internal databases for the general characterization of a gene including
    • Which tissue/cell type is the gene expressed in.
    • Which pathways is it part of and which other genes are in that pathway
    • Which diseases is it known to be associated to?
    • What is the interaction network of this gene in a particular tissue?
    • Has the gene been previously explored?
    • Is there patent data and/or human clinical data for the gene?
    • What assay, cell-type and conditions should be used for validation? 
  • To select and implement the appropriate methods and tools for generating gene embeddings from the different data sources, (eg. word2vec, doc2vec, autoencoders, graph neural networks) and with particular emphasis on transformers / large language models to represent literature information.
  • To evaluate and compare the quality and performance of different gene embeddings on various biological analysis tasks, such as gene function prediction, gene-disease association, gene pathway analysis and drug target predictions, using appropriate metrics and benchmarks.
  • Of particular interest, is the downstream comparison of the a-priori based information embeddings to the gene embeddings of cellular genetic perturbation in vitro imaging screening assays that Novo Nordisk has inhouse.

Expected Outcomes 

  • A comprehensive review of the existing methods and tools for generating gene embeddings from different data sources.
  • A comparative analysis of the different types of gene embeddings and their applications on various biological analysis tasks.
  • A critical evaluation of the strengths and limitations of different gene embeddings and data sources, and suggestions for possible improvements and extensions.
  • An extremely valuable comparison of the a-priori embeddings to the embeddings Novo Nordisk has inhouse from our perturbation assays.

The implications of this project are:

  • To provide a better understanding of the relationships between genes and their functions, and to facilitate the discovery of new biological insights and hypotheses.
  • To contribute to the advancement of the field of gene embeddings and their applications in biology and medicine.
  • To demonstrate the potential and challenges of applying natural language processing and machine learning techniques to biological data.
References Soman, Karthik, et al. "Biomedical knowledge graph-enhanced prompt generation for large language models." arXiv preprint arXiv:2311.17330 (2023).
Chen YT, Zou J. GenePT: A Simple But Hard-to-Beat Foundation Model for Genes and Cells Built From ChatGPT. bioRxiv [Preprint]. 2023, https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC10614824/
Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque. https://www.nature.com/articles/s41467-022-33026-0
In silico biological discovery with large perturbation models https://www.nature.com/articles/s43588-025-00870-1
Work Environment The student will work closely with the supervisor and will be able to interact with other colleagues of the bioAI Department, both in the Oxford as well as in the London site. We are a fully computational team, but in the Oxford site we have also various in vitro expert teams, to which the student can also be exposed to. If the student wishes to work in a hybrid mode, it is fine with the supervisor.
Prerequisite Skills Statistics, Image processing, Geometry / Topology, Mathematical analysis, Simulation, Predictive Modelling, Database queries, Data Visualisation, Probability / Markov Chains
Other skills used in the Project Statistics, Probability / Markov Chains, Image processing, Mathematical analysis, Geometry / Topology, Simulation, Predictive Modelling, Database queries, Data Visualisation
Acceptable Programming Languages Python, R
Additional Requirements Enthusiasm for biological applications of maths and a lot of willingness to learn Good communication and presentation skills are also desirable.
Application Instructions Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit.

 

Virtual Cells with Large Language Models

Project Title Virtual Cells with Large Language Models
Keywords Virtual Cell, LLMs, Causality, In-contex learning
Project Listed 9 January 2026
Project Status Open
Application Deadline 27 February 2026
Project Supervisor Marc Boubnovski Martell and Josepha Stoisser
Contact Name Josefa Stoisser
Contact Email ofsr@novonordisk.com
Company/Lab/Department Novo Nordisk, BioAI team
Address Novo Nordisk R&D Digital Hub, Pancreas Rd, London, N1C 4AG, UK
Project Duration 8-10 weeks, full-time
Project Open to Masters students (Part III)
Background Information

A virtual cell is an in-silico model (a kind of “digital twin”) that lets us predict how a living cell will respond to interventions (e.g. adding a drug). This sits in a fast-moving area of BioAI, with community benchmarks such as the Virtual Cell Challenge [1] pushing models toward realistic generalisation settings.

At the core of the virtual cell is biological perturbation prediction: given a baseline cell state, predict how the cell changes after an intervention (e.g., knocking out a gene). Conceptually, this is a causal effect problem, made hard by biological confounding, incomplete measurements, and distribution shift (new cell types, new perturbations, new experimental settings).

Our recent work (“LangPert”, ICLR 2025 workshop spotlight [2]) suggests a practical path forward: use LLMs to retrieve and synthesise mechanistic biological context (gene function, pathways, interactions, etc.) and condition predictive models on that context. The key benefit is zero-shot or low-data generalisation to perturbations the model has not seen during training, while also producing explanations that are at least partially aligned with known biology. In parallel, LLM-powered causal analysis motivates a causality-first approach to virtual cells [3, 4].

Project Description

The project will explore LLM-informed causal modelling for perturbation prediction. The high-level aim is to use LLM knowledge as a contextual guidance—not as a replacement for data—so that models can better infer gene–gene relationships and predict outcomes of interventions, especially when faced with novel perturbations or shifted experimental conditions.

  1. Context building with LLMs: Retrieve and summarise biologically relevant information for a given perturbation. Explore different ways of representing the context.
  2. Causal / intervention-aware prediction: Combine LLM-derived context with state-of-the-art models for intervention prediction and causal effect estimation (e.g., Do-PFN [5]). Investigate how LLM context should enter the model (e.g., as an uncertainty-aware DAG prior), and whether identification-consistent in-context exemplars (baseline and singles) are sufficient for generalising to unseen combinations.
  3. Generalisation and robustness: Evaluate performance on held-out perturbations and under distribution shift (e.g., new cell types). Run ablations and compare outcomes to causal identification expectations to measure when context helps vs. hurts.
  4. Interpretability (if time): Assess whether model rationales and retrieved context align with known biology and identify failure modes (e.g. hallucinated context).

Because the AI landscape changes quickly the specific LLM, retrieval approach, and causal estimator will be updated at the project start to reflect the best available options.

Successful outcome:

  • A reproducible experimental pipeline (datasets, baselines, evaluation splits, and ablations).
  • A short technical report summarising findings and recommendations.
  • If results are strong and pass internal review, the work may be included in an AI workshop paper submission.
References [1] Virtual Cell Challenge. https://virtualcellchallenge.org/.
[2] Märtens, K., Boubnovski Martell, M., Prada-Medina, C. A., & Donovan-Maiye, R. (2025). LangPert: LLM-Driven Contextual Synthesis for Unseen Perturbation Prediction. MLGenX Workshop at ICLR 2025 (Oral). https://openreview.net/forum?id=Tmx4o3Jg55.
[3] Wang, X., Zhou, K., Wu, W., Singh, H. S., Nan, F., Jin, S., Philip, A., Patnaik, S., Zhu, H., Singh, S., Prashant, P., Shen, Q., & Huang, B. (2025). Causal-Copilot: An Autonomous Causal Analysis Agent. arXiv:2504.13263 [cs.AI]. https://doi.org/10.48550/arXiv.2504.13263.
[4] Kıcıman, E., Ness, R. O., Sharma, A., & Tan, C. (2024). Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. Transactions on Machine Learning Research (TMLR). https://doi.org/10.48550/arXiv.2305.00050
[5] Robertson, J., Reuter, A., Guo, S., Hollmann, N., Hutter, F., & Schölkopf, B. (2025). Do-PFN: In-Context Learning for Causal Effect Estimation. NeurIPS 2025. https://doi.org/10.48550/arXiv.2506.06039.
Work Environment The student will join the BioAI team at Novo Nordisk, supervised by Josefa Stoisser and Marc Boubnovski Martell, with co-supervision from Jialin Yu (University of Oxford). The BioAI team develops AI/LLM/agentic systems for drug discovery and has a publication track record in top-tier AI venues (NeurIPS, ACL, ICML, ICLR). The office is in King’s Cross, London. Remote work is possible, but 2–3 days per week on-site is preferred.
Prerequisite Skills Statistics, Mathematical analysis
Other skills used in the Project LLMs, Causal Inference
Acceptable Programming Languages Python
Additional Requirements -
Application Instructions Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit.

 

Fragmented Order-book Content Assessment and Liquidity-weighting (FOCAL)

Project Title Fragmented Order-book Content Assessment and Liquidity-weighting (FOCAL)
Keywords FX Markets, limit order book, microprice
Project Listed 9 January 2026
Project Status Open
Application Deadline 27 February 2026
Project Supervisor Jan Novotny
Contact Name Jan Novotny
Contact Email jan.novotny@nomura.com
Company/Lab/Department Nomura International Plc
Address 1 Angel Ln, London EC4R 3AB
Project Duration 8-10 weeks full time
Project Open to Masters students (Part III)
Background Information Foreign exchange markets present a particularly compelling use case for this research, as they represent the world's largest and most liquid financial market with daily trading volumes exceeding $7 trillion. Unlike centralized exchange-traded assets, FX markets operate as a decentralized, over-the-counter network where liquidity is highly fragmented across numerous market makers, electronic communication networks (ECNs), and trading platforms. This fragmentation creates significant challenges for price discovery, as there is no single consolidated order book or official exchange rate at any given moment. The decentralized nature of FX trading means that different liquidity providers may quote varying prices simultaneously, making the aggregation and analysis of order book information both more complex and more valuable for understanding true market conditions. Given the enormous scale and fragmented structure of FX markets, developing robust methods to classify information content across multiple liquidity pools could yield substantial improvements in price discovery, execution quality, and market efficiency.
Project Description

Primary Objectives:

  • Develop a robust methodology for aggregating order books across multiple liquidity pool
  • Create an information content classification system that quantifies the predictive value of different order book levels
  • Design metrics to identify which price levels contain the most relevant information for mid-price determination
  • Build a real-time nowcasting model for current market prices based on aggregated order book data

Secondary Objectives:

  • Analyze the relative importance of different liquidity pools in price discovery 
  • Investigate how information content varies across different market conditions (high/low volatility, different trading sessions)
  • Assess the temporal stability of information content metrics

Information Content Metrics:

  • Apply information theory measures (entropy, mutual information) to quantify price level importance
  • Implement machine learning techniques to identify patterns in order book informativeness
  • Develop weighted scoring systems based on historical price impact and predictive accuracy

Validation Framework:

  • Backtest the classification system against historical price movements
  • Compare nowcasting accuracy against benchmark models
  • Conduct out-of-sample testing to ensure robustness

Applications

  • Algorithmic trading strategy optimization
  • Risk management and position sizing
  • Market making and liquidity provision strategies
References n/a
Work Environment The internship will be in person in office (hybrid model possible), candidate will be closely working together with the team.
Prerequisite Skills Statistics
Other skills used in the Project Predictive Modelling, Simulation
Acceptable Programming Languages Python, kdb+/q
Additional Requirements Enthusiasm to learn on the real case study
Application Instructions Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit.

 

Option pricing with quantum information

Project Title Option pricing with quantum information
Keywords Option pricing, quantum information
Project Listed 9 January 2026
Project Status Open
Application Deadline 27 February 2026
Project Supervisor Paul McCloud
Contact Name Paul McCloud
Contact Email paul.mccloud@nomura.com
Company/Lab/Department Nomura
Address 1 Angel Lane, London EC4R 3AB
Project Duration 8 weeks
Project Open to Masters students (Part III), Third year undergraduates (Part II)
Background Information Nomura is a global financial services group with an integrated network spanning over 30 countries. The Quantitative Research team supports Global Markets businesses by developing mathematical models for the pricing and risk management of derivative trades, in close partnership with the trading desks. The role requires an exceptional level of technical quantitative skills, ideally backed up by mathematical research experience (not necessarily related to finance).
Project Description Option pricing is the most elementary challenge of derivative modelling and is the foundation for many of the solutions needed by a Global Markets structured products business. Traditional methods employ classical stochastic calculus, but this approach can struggle when applied with complex boundary conditions, which potentially limits the product offering of the business. This project explores numerical methods for option pricing established on noncommutative information, to see if the novel degrees of freedom this introduces can facilitate more efficient schemes or generate better convergence and fitting to options markets. Abstracted as a pure mathematical challenge, the project considers the application of results from noncommutative algebra to well-posed problems whose solutions can be mapped onto option pricing.
References [1] McCloud, P. “Quantum bounds for option pricing” (2018) arxiv.org/abs/1712.01385
[2] McCloud, P. “Information and arbitrage: applications of quantum groups in mathematical finance” (2024) arxiv.org/abs/1711.07279
[3] McCloud, P. “The relative entropy of expectation and price” (2025) arxiv.org/abs/2502.08613
Work Environment You will research the project remotely, supported by a supervisor at Nomura and with occasional visits to the Nomura London office for progress updates.
Prerequisite Skills Mathematical Physics, Algebra / Number theory, Mathematical analysis
Other skills used in the Project Numerical Analysis, Partial Differential Equations, Probability / Markov Chains
Acceptable Programming Languages Python, MATLAB
Additional Requirements Curiosity and a willingness to apply ideas in novel contexts
Application Instructions Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit.

 

Optimising the testing and selection process of cut flowers using historic performance and quality data

Project Title Optimising the testing and selection process of cut flowers using historic performance and quality data
Keywords Horticulture, varietal development, supply chain
Project Listed 9 January 2026
Project Status Open
Application Deadline 27 February 2026
Project Supervisors Lauren Hibbert and Richard Boyle
Contact Name Lauren Hibbert
Contact Email lauren.hibbert@apexhorticulture.com
Company/Lab/Department APEX Horticulture
Address Pierson Road, The Enterprise Campus, Alconbury Weald, PE284YA
Project Duration 8 weeks
Project Open to Masters students (Part III), Third year undergraduates (Part II)
Background Information

APEX Horticulture Ltd. is a professional research and development business, offering bespoke testing services for cut flowers and plants. APEX has three purpose-built testing centres in the UK and US. APEX is a division in the wider MM group, where the primary business, MM Flowers, is one of the UK’s leading cut flower importer/processing companies, with vertically integrated ownership model and innovative practices. More recently, the MM group has diversified its activities, including supplying plants, bulbs and other gifting products to the retailers in the UK and Europe. MM is owned by the AM Fresh Group, a leading breeder, grower and distributor of citrus and grapes; Vegpro, East Africa’s largest flower and vegetable producer; and Elite, based in South America, and the leading flower grower globally.

APEX is at the optimal position in the chain, able to deliver high quality, independent research and close-to-market proximity matched with the invaluable insight into the true performance of flowers and plants subjected to actual supply chain conditions. The infrastructure and specialised personnel of APEX aims to deliver robust, standardised and consistent research every week of the year, together with the ability to undertake large scale projects to match all client requirements, influencing all elements of the cut flower supply chain.

APEX undertakes many different research projects covering the entire supply chain, from development of new flower types through to the manufacturing requirements for the final bouquets. Each of these projects generates a significant amount of data and insight, which is used to provide recommendations to the various stakeholders of each project.

Project Description APEX tests over 50k cut flower samples annually, with around 30-60 data points generated per sample. Whilst this is often focussed on certain crop types, such as roses and lilies, many more types of flowers are tested across many different projects. The data generated includes agronomic and freight data, through to performance data associated with sample longevity (‘vase’/’shelf’ life) and aesthetic appeal. Several of the projects undertaken by APEX are long term with key strategic stakeholders, which allows for an assessment of flower performance and quality over many months and years. Each sample often has significant background information, including the type of flower, the growing location and agronomic practices, and the freight mode, for example. There are many influencing factors that can impact the above, such as weather conditions, freight delays and handling through the supply chain, which can often result in variability across a testing programme. Whilst APEX will design projects to try and account for this potential variation, there is a desire to use existing data to improve the efficiency and accuracy of the testing process. Selecting flower types and cultivars that do not meet the required standards can result in significant waste, consumer dissatisfaction and potentially brand damage, and therefore having the best insight possible reduces the risk of this. This clearly has implications across the supply chain, from the breeder/grower through to the suppliers and retailers. Therefore, can existing datasets be used to determine an appropriate model for assessing the viability of cut flowers (such as a new flower type, cultivar or treatment, for example), albeit more effectively and efficiently than the current process.
References
Work Environment Student will be part of a wider team, but will be leading the project. Working pattern can also be hybrid (and largely remote)
Prerequisite Skills Statistics, Mathematical analysis
Other skills used in the Project Statistics
Acceptable Programming Languages No preference
Additional Information Desire to operate in a commercial business, and provide insights that can inform real world decisions.
Application Instructions Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit.

 

Agentic AI for Formalized Math

Project Title Agentic AI for Formalized Math
Keywords AI Lean 4 Agents LLM
Project Listed 9 January 2026
Project Status Open
Application Deadline 27 February 2026
Project Supervisor Nehal Patel and Charles Martinez
Contact Name Nehal Patel
Contact Email nehal.patel@gresearch.co.uk
Company/Lab/Department G-Research
Address 1 Soho Pl, London W1D 3BG
Project Duration Flexible, 8-12 week duration, summer of of 2026,
Project Open to Masters students (Part III), Third year undergraduates (Part II), Second year undergraduates (Part IB)
Background Information AI agents and interactive theorems provers have the potential to forever change the way mathematics is done. This project provides students with a hands on opportunity to learn and apply these tools in their area of research.
Project Description Students will formalize, in Lean 4, a topic of their choosing using agentic AI techniques. The initial toolset for AI theorem proving will be provided and students will have the opportunity to help shape the improvement of these tools. Depending on the student's interest, work may either focus primarily on formalization or may include working on the agentic theorem proving framework. Caveats: Not all branches of math are easy to model in Lean 4. Prior experience with Lean 4 is advisable. Prior experience with AI & LLMs not required, but helpful. Prior experience with programming and a hacker ethos are also highly desirable.
References

Introductions to Lean:
https://alexkontorovich.github.io/2025F311H/
https://adam.math.hhu.de/

Agentic Theorem Prover (One of Many):
HILBERT: RECURSIVELY BUILDING FORMAL PROOFS WITH INFORMAL REASONING
https://arxiv.org/pdf/2509.22819

Work Environment Work will be directed primarily from GR staff based in Boston. Student will work mostly independently and remotely, coordinating work via Github and video meetings (with a meeting cadence that will adapt as the project progresses). Twice during the summer, Boston staff will be present in England and will arrange 1-3 day intensive sessions with student for joint collaboration.
Prerequisite Skills Formal Math in Lean 4
Other skills used in the Project App Building
Acceptable Programming Languages Python, Lean 4
Additional Requirements Candidates should be prepared to propose some mathematics that they would like to formalize using AI tools in Lean. This could draw from their current research focus or their general interests. Topics from recreational math or applied topics are acceptable. Students are encouraged to investigate to what extent necessary background theories have already been formalized.
Application Instructions Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit.

 

Correlation Between Forecast Accuracy, Stock Dwell Time and Retailer Waste on Customer Complaints: A Study of Yellow and White 40cm & 50cm Roses at MM Flowers

Project Title Correlation Between Forecast Accuracy, Stock Dwell Time and Retailer Waste on Customer Complaints: A Study of Yellow and White 40cm & 50cm Roses at MM Flowers
Keywords Forecast Accuracy; Stock Dwell Time; Retailer Waste; Customer Complaints; Statistical Analysis
Project Listed 16 January 2026
Project Status Open
Application Deadline 27 February 2026
Project Supervisor Ellanette van Zyl
Contact Name Ellanette van Zyl
Contact Email Ellanette.vanzyl@mm-flowers.com
Company/Lab/Department MM Flowers
Address Pierson Road, The Enterprise Campus, Alconbury Weald, Huntingdon, PE28 4YA
Project Duration 8 weeks, full-time 40 hours/week
Project Open to Masters students (Part III), Third year undergraduates (Part II)
Background Information

MM Flowers operates in a highly time-sensitive fresh flower supply chain, where product quality, availability and freshness are critical drivers of customer satisfaction. Even small inaccuracies in forecasting or delays in product movement can result in extended stock dwell time, increased waste at retailer level and ultimately higher customer complaint volumes. Roses, in particular, represent a high-volume and high-visibility product category where performance variability can have a significant commercial and reputational impact.

Forecast accuracy directly influences ordering decisions, inbound volumes and stock allocation. When actual arrivals deviate from forecasted volumes, this can lead to either stock shortages, impacting service levels, or excess stock levels, increasing dwell time and the risk of quality deterioration. Longer dwell times at MM Flowers or retail stores can accelerate deterioration, contribute to retailer waste and negatively affect the end consumer experience.

This project focuses specifically on yellow and white 40cm and 50cm roses, which are core SKUs within the MM Flowers portfolio and are particularly sensitive to demand variability and shelf-life constraints. By analysing the relationships between forecast accuracy, stock dwell time, retailer waste and customer complaints for these products, the project aims to identify whether measurable correlations exist across the supply chain.

The project is interesting and valuable because it connects operational planning decisions with downstream quality outcomes and customer feedback. Understanding these relationships will support more data-driven forecasting, stock management and waste-reduction strategies, while also providing insight into how supply chain performance ultimately affects customer satisfaction. The findings have the potential to inform targeted improvements for key rose lines and contribute to broader continuous improvement initiatives within MM Flowers.

Project Description

This project will investigate whether measurable correlations exist between forecast accuracy, stock dwell time, retailer waste and customer complaints for yellow and white 40cm and 50cm roses supplied by MM Flowers. The project is primarily data-driven and quantitative, making it well suited to a student with strengths in mathematics, statistics or data analysis.

The project will begin with a data familiarisation and definition phase, during which the student will work with historical supply chain data provided by MM Flowers. This will include forecast volumes, actual arrival quantities, stock dwell time metrics, retailer waste data and customer complaint records. The student will be responsible for defining and calculating appropriate performance measures, such as forecast accuracy metrics (e.g. absolute error or percentage error), dwell time distributions and waste rates.

In the next phase, the student will apply statistical and mathematical techniques to explore relationships between variables. This may include:

  • Descriptive statistics to summarise trends and variability;
  • Correlation analysis to quantify the strength and direction of relationships between forecast accuracy, dwell time, waste and complaints;
  • Regression analysis to assess the relative impact of each variable on customer complaints;
  • Time-series or lag analysis to investigate delayed effects, such as whether longer dwell times or excess stock in one period lead to increased complaints in subsequent periods.

The project is open-ended in nature, allowing findings from the initial analysis to guide deeper investigation. For example, if strong correlations are identified for certain rose lengths or colours, the student may focus further analysis on those segments or explore threshold effects where performance begins to deteriorate significantly.

A successful outcome would be:

  • A clear, evidence-based assessment of whether and how forecast accuracy and dwell time influence retailer waste and customer complaints;
  • Identification of key drivers or risk indicators that are most strongly associated with complaints;
  • Practical, data-backed insights that MM Flowers could use to improve forecasting, reduce waste and enhance customer satisfaction for core rose lines.

The project is interesting and useful because it links mathematical analysis directly to real-world operational and commercial outcomes. Students will gain experience applying statistical methods to complex, imperfect industry data, while MM Flowers will benefit from improved understanding of how planning and stock decisions affect product quality and customer experience across the supply chain.

References https://mm-flowers.com
Work Environment

The student will work independently on the core analytical aspects of the project, with regular guidance and supervision from the project lead at MM Flowers. The project will be based within a business and operational environment rather than a laboratory, giving the student exposure to real-world supply chain data and decision-making contexts.

In addition to the primary supervisor, the student will have opportunities to engage with forecasting, supply chain planning and quality teams at MM Flowers, allowing them to discuss data definitions, operational processes and practical implications of their findings. While there is no formal academic research group on site, the student will be supported through regular check-ins and access to subject matter experts across the business.

Working hours will be flexible, aligned with standard office hours, and can be adjusted to accommodate academic commitments. The project can be conducted in a hybrid format, combining remote analytical work with occasional on-site days at MM Flowers when beneficial for data access, collaboration and project reviews.

Day-to-day work will primarily involve data analysis, modelling and interpretation, with time allocated for meetings, progress reviews and refinement of the analytical approach. The student will be encouraged to manage their own time, structure their analysis and propose next steps, mirroring the autonomy expected in both industry and academic research roles.

This working environment offers a balance of independent mathematical problem-solving and practical business engagement, providing a supportive setting for a mathematics student to apply theoretical skills to a real operational challenge.

Prerequisite Skills Statistics, Predictive Modelling, Data Visualisation, Database queries, Applied statistics, regression analysis, exploratory data analysis, and translating real-world problems into quantitative models.
Other skills used in the Project Predictive Modelling, Statistics, Data Visualisation, Database queries, Simulation, Applied statistical analysis, handling real-world operational data, basic programming skills (e.g. Python or R), critical interpretation of quantitative results, and ability to communicate findings clearly to non-technical stakeholders.
Acceptable Programming Languages Python, R, No preference, SQL
Additional Requirements We are looking for a student who is curious, analytical and motivated to apply mathematical skills to real-world problems. The ideal candidate will demonstrate enthusiasm for data-driven analysis and a willingness to engage with complex, imperfect datasets typical of an operational business environment. Strong problem-solving ability, attention to detail and critical thinking are important, along with a willingness to question assumptions and explore findings independently. The student should be comfortable working autonomously while also being open to feedback and discussion. Good communication skills are essential, as the project will require explaining quantitative findings clearly to non-technical stakeholders and translating mathematical results into practical business insights. An interest in supply chain, forecasting or data analytics in an applied setting would be advantageous, but not essential. Above all, we value a positive attitude, intellectual curiosity and a willingness to learn, as the project offers scope for the student to shape the direction of the analysis based on their findings.
Application Instructions Send your CV to the contact provided above along with a covering email which explains why you are interested in the project and why you think you would be a good fit.