
2025 Industrial CMP projects
Below you will find the list of industrial CMP projects hosted by external companies (jump to list). Click here to see the list of academic projects hosted by other departments and labs within the university.
New projects will be added throughout Lent Term so check back regularly!
How to Apply
Unless alternative instructions are given in the project listing, to apply for a project you should send your CV to the contact provided along with a covering email which explains why you are interested in the project and why you think you would be a good fit.
Need help preparing a CV or advice on how to write a good covering email?
The Careers Service are there to help! Their CV and applications guides are packed full of top tips and example CVs.
Looking for advice on applying for CMP projects specifically? Check out this advice from CMP Co-Founder and Cambridge Maths Alumnus James Bridgwater.
Remember: it’s better to put the work into making fewer but stronger applications tailored to a specific project than firing off a very generic application for all projects – you won’t stand out with the latter approach!
Please note that to participate in the CMP programme you must be a student in Part IB, Part II, or Part III of the Mathematical Tripos at Cambridge.
Want to know more about a project before you apply?
Come along to the CMP Lunchtime Seminar Series in February 2025 to hear the hosts give a short presentation about their project. There will be an opportunity afterwards for you to chat informally with hosts about their projects.
Alternatively (or as well!), you can reach out to the contact given in the project listing to ask questions.
Industrial CMP Project Proposals for Summer 2025
- Amazon Lab126 - Enabling Large Models for Edge AI with Disentanglement and Compositionality
Keywords: Disentanglement, Compositionality, Edge AI, Model Optimization, Resource Efficiency - Vanellus Technologies Ltd - Accelerate CFD convergence with improved field initialisation and mixed precision solves
Keywords: Simulation, Software Engineering, Numerical Analysis, Fluid Dynamics, Physics - APEX Horticulture - Utilising existing cut flower performance and quality data to inform and accelerate decisions for future developments and planting decisions
Keywords: Horticulture, Predictive modelling - Silvaco TCAD - Filtering of the result of Monte Carlo simulation
Keywords: Statistics, Mathematical physics, Numerical Analysis, Monte Carlo simulation, Filtering - Amazon Lab126 - Prisoners Dilemma, LLMs as agents
Keywords: Game theory, LLM agents, Knowledge graphs, Stochastic modelling - AstraZeneca PLC - Hallmarks of cancer regression
Keywords: predictive biomarkers, multimodal data, hierarchical regression, Hallmarks of cancer - Silvaco, Process Engineering Team - Investigation of dopant activation and diffusion in SiC
Keywords: tcad, modeling, activation, diffusion, SiC - Signaloid Ltd - Discrete Representations of Continuous Probability Distributions
Keywords: Distributions, Probability, Representations, Statistics - Silvaco Europe, TCAD - Finite Difference Approximation of Multiphase Stokes Flow with Free Interfaces on Staggered Cartesian Grids
Keywords: Multiphase Stokes Flow, Finite-Difference Methods, PDE, Applied Linear Algebra - Unilever SERS - Exploring the use of Generative Adversarial Networks for synthetic data generation
Keywords: Generative adversarial networks (GANs), Synthetic data, Toxicology, Neural networks, Applied scientific computing - Pharo Management - Bucketed interest rate risk
Keywords: Financial Mathematics, Interest Rates, Risk Management - LifeArc - Machine learning on multimodal and unstructured data for healthcare applications
Keywords: AI, machine learning, healthcare, multimodal data
Enabling Large Models for Edge AI with Disentanglement and Compositionality
Project Title | Enabling Large Models for Edge AI with Disentanglement and Compositionality |
Keywords | Disentanglement, Compositionality, Edge AI, Model Optimization, Resource Efficiency |
Project Listed | 8 January 2025 |
Project Status | Open |
Contact Name | Orange Gao |
Contact Email | orangez@amazon.com |
Company/Lab/Department | Amazon Lab126 |
Address | One Station Square, Cambridge, CB1 2GA |
Project Duration | 8 weeks; full-time |
Project Open to | Master's (Part III) students |
Background Information |
Deploying large AI models on edge devices is a significant challenge due to their limited computational, memory, and energy resources. These constraints often necessitate trade-offs between model performance and efficiency, making it difficult to use cutting-edge AI technologies in applications like IoT, wearables, and real-time systems. This project explores the intersection of disentanglement and compositionality, two promising concepts in AI research, to address these challenges: Disentanglement focuses on isolating meaningful, task-specific features from complex data representations. By enhancing interpretability and generalization, disentanglement makes it possible to optimize models while preserving essential functionality. Compositionality allows models to break down tasks into smaller, reusable components that can be recombined to address a variety of tasks. This modular approach facilitates scalability and adaptability, especially in resource-constrained environments. By leveraging these principles, the project aims to make large AI models lightweight and efficient while retaining strong performance. This approach offers the potential to unlock new applications for AI on edge devices, where real-time performance, adaptability, and energy efficiency are critical. |
Project Description |
This project involves exploring and developing methods to enable large AI models to operate efficiently on edge devices by leveraging disentanglement and compositionality. The work is open-ended, allowing flexibility to adapt the later stages based on findings from initial experiments. The student will undertake the following key activities: Feature Disentanglement
Model Pruning and Quantization
Knowledge Distillation
Compositional Representation Learning
Edge Deployment Optimization
Successful Outcome A successful outcome would include:
How It’s Interesting/Useful This project combines cutting-edge AI techniques with real-world application in edge computing. The outcomes can be impactful for industries like IoT, wearables, and personalized AI, where resource efficiency is critical. The modular approach ensures the work is extensible, allowing integration into diverse AI tasks.
Use of Mathematical Skills Students will actively use mathematical skills in areas such as:
By the end of the project, the student will gain experience applying theoretical mathematical concepts to practical problems in AI and edge computing, contributing to a rapidly evolving field. |
Work Environment |
The student will work independently on this project, with myself serving as the industrial supervisor. I will provide regular guidance and mentorship, helping the student define goals, troubleshoot challenges, and refine their approach throughout the project. Although the student will primarily work on their own, I will be readily available for discussions and feedback through scheduled meetings and as needed via email or video calls. The student will have the flexibility to work remotely, allowing them to structure their schedule to maximize productivity. There are no fixed office or lab hours, but the student is encouraged to maintain consistent progress and attend periodic check-ins to review milestones and ensure alignment with the project goals. Day-to-day, the student will engage in tasks such as implementing and testing machine learning models, analyzing results, and documenting findings. They will have access to tools, datasets, and resources necessary for the project, along with my guidance to navigate technical or conceptual challenges. This setup offers the student a hands-on, immersive experience while fostering independence and problem-solving skills. |
References | [1] beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR 2017 [2] Infogan: Interpretable representation learning by information maximizing generative adversarial nets. NeurIPS 2016 [3] Wu, Cindy, et al. "What Mechanisms Does Knowledge Distillation Distill?." Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models. PMLR, 2024. [4] Chen, H., Zhang, Y., Wang, X., Duan, X., Zhou, Y., & Zhu, W. (2023). Disenbooth: Disentangled parameter-efficient tuning for subject-driven text-to-image generation. ICLR 2024. [5] Challenging common assumptions in the unsupervised learning of disentangled Representations," ICML 2019 [6] Jin, Zeng et al., ” Closed-Loop Unsupervised Representation Disentanglement with β-VAE Distillation and Diffusion Probabilistic Feedback.” In ECCV 2024 |
Prerequisite Skills | Statistics, Probability/Markov Chains, Image processing, Mathematical Analysis |
Other Skills Used in the Project | Numerical Analysis, Mathematical Analysis, Simulation, Predictive Modelling |
Acceptable Programming Languages | Python |
Accelerate CFD convergence with improved field initialisation and mixed precision solves
Project Title | Accelerate CFD convergence with improved field initialisation and mixed precision solves |
Keywords | Simulation, Software Engineering, Numerical Analysis, Fluid Dynamics, Physics |
Project Listed | 8 January 2025 |
Project Status | Open |
Contact Name | Laurence Cullen |
Contact Email | laurence@vanellus.tech |
Company/Lab/Department | Vanellus Technologies Ltd |
Address | Unit 6, The Courtyard, Sturton Street, Cambridge, CB1 2SN |
Project Duration | 8 weeks; full-time |
Project Open to | Master's (Part III) students |
Background Information |
In engineering, computational fluid dynamic (CFD) + thermodynamic simulations are an increasingly critical tool for designing performance optimised systems. However increasing design complexity and higher performance requirements means more pressure is put on current simulation tools. For many applications, current simulation tools are too slow, inaccurate or hard to use for effective design optimisation. Vanellus is developing a new GPU-based multiphysics simulation and optimisation engine in order to remove current bottlenecks on simulation usage. At its core, CFD involves solving complex non-linear PDEs using numerical algorithms. Numerically solving non-linear PDEs almost always involves an iterative process, where an initial guess of the solution is gradually improved upon. A high-value research area is coming up with ways to improve your initial guess, so that it is closer to the true solution, therefore requiring fewer iterations to reach convergence. The challenge here is finding the balance between the quality of the initial guess and the amount of computing resources it takes to find it. |
Project Description |
As a small startup, we have a range of mathematical tasks to tackle with a flexible R&D roadmap, so it’s worth noting that depending on your research interests and our technical progress, we are open to adapting the project to suit our mutual needs. What we would like you to do:
Some ideas for research directions we have found:
Successful outcome: Improved non-linear PDE initialisation is a large open field, so we do not expect the problem to be fully solved during the time of the placement. Success for us will be if, at the project conclusion, we can have some preliminary results that either show the potential or disprove the usefulness of a particular numerical method.
How would it be interesting/useful? This project will allow you to use your mathematical expertise in the context of an R&D-focused software engineering startup. In addition to improving your knowledge and skills in numerical analysis, we expect you to learn the basics of software engineering in a team, including using version control and software engineering principles such as unit testing. Our hope is you would leave this placement in a great position either to continue with academic research or to pursue a career demanding software skills. For those who are interested, we predominantly program in Python, specifically using the JAX accelerated numerical computing library. We also sometimes make use of lower-level languages such as CUDA and Rust. |
Work Environment |
You will work in the office as part of our team including the company founders. We are based in Cambridge (10 min walk from the train station). We typically do 9-5:30 working hours. We are a heavily collaborative team, so we’re sharing ideas and knowledge throughout the day. We start every day with a 15-minute meeting where we all share what we’ll be working on for that day and if we need any help. As well as your own project, we would love to get your insight on our day-to-day R&D mathematical problems at the whiteboard. We have a strong emphasis on peer learning, and all of our code goes through review the team, where we share ideas on how to improve code quality and structure. |
References |
Fluid Mechanics 101 YouTube Channel: https://www.youtube.com/playlist?list=PLnJ8lIgfDbkoZ33CHr-p6z2CBkp9OTcWj Notes on CFD from the developers on OpenFOAM: https://doc.cfd.direct/notes/cfd-general-principles/ Numerical Linear Algebra by Trefethen & Bau: https://www.stat.uchicago.edu/~lekheng/courses/309/books/Trefethen-Bau.pdf CFDNet: A deep learning-based accelerator for fluid simulations, Obiols-Sales et al. https://arxiv.org/pdf/2005.04485 On floating point precision in computational fluid dynamics using OpenFOAM, Brogi et al. https://www.sciencedirect.com/science/article/pii/S0167739X23003813 |
Prerequisite Skills | Fluids, Numerical Analysis, PDEs, Simulation |
Other Skills Used in the Project | Statistics, Mathematical physics, Predictive Modelling, Data Visualization, App Building |
Acceptable Programming Languages | Python, MATLAB, C++, Rust, CUDA, C |
Utilising existing cut flower performance and quality data to inform and accelerate decisions for future developments and planting decisions
Project Title | Utilising existing cut flower performance and quality data to inform and accelerate decisions for future developments and planting decisions |
Keywords | Horticulture, Predictive modelling |
Project Listed | 8 January 2025 |
Project Status | Filled |
Contact Name | Richard Boyle |
Contact Email | richard.boyle@apexhorticulture.com |
Company/Lab/Department | APEX Horticulture |
Address | Pierson Road, The Enterprise Campus, Alconbury Weald, PE284YA |
Project Duration | 8 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information | APEX Horticulture Ltd. is a professional research and development business, offering bespoke services for cut flowers and plants. APEX is based in a purpose-built testing centre, situated in Alconbury, Cambridgeshire (UK). APEX is part of the wider MM Flowers group, where MM is one of the UK’s leading cut flower importer/processing companies, with a unique ownership model and innovative practices. MM Flowers is owned by the AM Fresh Group, a leading breeder, grower and distributor of citrus and grapes; Vegpro, East Africa’s largest flower and vegetable producer; and Elite, the leading flower grower and breeder in South America. APEX is at the optimal position in the chain, able to deliver high quality, independent research and close-to-market proximity matched with the invaluable insight into the true performance of flowers and plants subjected to actual supply chain conditions. The infrastructure and specialised personnel of APEX aims to deliver robust, standardised and consistent research every week of the year, together with the ability to undertake large scale projects to match all client requirements, influencing all elements of the cut flower supply chain. APEX undertakes many different research projects covering the entire supply chain, from development of new flower types through to the manufacturing requirements for the final bouquets. Each of these projects generates a significant amount of data and insight, which is used to provide recommendations to the various stakeholders of each project. |
Project Description |
APEX tests up to 50k cut flower samples annually, with around 30-60 data points generated per sample. This data includes agronomic and freight data, through to performance data associated with flower longevity and aesthetic appeal. Many of the samples tested are part of long-term programmes focussed on understanding the performance of different cultivars and farms across seasons and years. Alongside this, prospective new cultivars are tested to understand if there are alternative and ‘better’ options available to the current selection. The process to develop and introduce new cultivars is inefficient however, taking anywhere up to 10 years. This is heavily reliant on intuition of breeders, and it can be a challenge to successfully introduce new cultivars that meet rapidly changing requirements. For example, whilst many of the cut flowers grown on the equator are transported to Europe by air freight, the entire industry is currently evaluating the possibility of transitioning much of this to sea freight. Whilst this presents many benefits including environmentally and availability, it substantially increases the freight time, which many existing flower types and cultivars are not able to withstand. During the development process, the breeders and growers are presented with a dilemma, where there is a desire to be informed and led by data (such as from APEX), but this is a slow process due to limited numbers of samples available initially. Accelerated data generation would require significantly more plants and thus samples, which requires various resources (time, space and inputs), but at greater risk if the cultivars prove to be unviable - an abundance of data is available however where new cultivars have been introduced, with varying levels of success. This has obvious implications for the breeder/grower, but also for those along the supply chain, including suppliers and retailers. Flower types and cultivars selected that do not meet the required standards can result in significant waste, consumer dissatisfaction and potentially brand damage. As such, there is a desire to try and improve the efficiency/speed of the flower development process whilst either minimising/understanding the associated risks. Given the above, there are different areas that APEX, MM Flowers and the wider group would like to explore, including –
|
Work Environment | Student led project, supported by wider team. Hybrid working. |
Filtering of the result of Monte Carlo simulation
Project Title | Filtering of the result of Monte Carlo simulation |
Keywords | Statistics, Mathematical physics, Numerical Analysis, Monte Carlo simulation, Filtering |
Project Listed | 15 January 2025 |
Project Status | Open |
Contact Name | Artem Babayan |
Contact Email | artem.babayan@silvaco.com |
Company/Lab/Department | Silvaco TCAD |
Address | SIlvaco Europe Ltd. 5, Compass Point, St Ives, PE27 5JL |
Project Duration | 8-12 weeks full time |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information | Mathematical modelling of real-life physical problems |
Project Description |
Silvaco is the software engineering company developing the tools to assist in manufacturing of semiconductor devices. In UK office we mostly work on 'process simulation' side -- mathematical modelling of the processes used in manufacturing. One of such processes is implantation -- bombardment of piece of (typically) Si with ions (dopants), to change the electrical properties of the target in specific areas. To predict the final ion distribution we use Monte Carlo simulation -- follow the path of large number of ions, as they fly through the structure. The final results show artefacts, typical for Monte Carlo simulation -- e.g. single stray particles or missed areas ('hot' and 'cold' spots correspondingly). We need to apply filter to these 'raw' results, to improve the overall quality. Your task would be to review the literature and to suggest and to implement the required algorithms. |
Work Environment | The project assumes the high degree of independence. The development part is expected to be done in the office (in St Ives, near Cambridge). |
References | |
Prerequisite Skills | |
Other Skills Used in the Project | Statistics, Mathematical physics, Numerical Analysis, Simulation |
Acceptable Programming Languages | Python, MATLAB, C++ |
Prisoners Dilemma, LLMs as agents
Project Title | Prisoners Dilemma, LLMs as agents |
Keywords | Game theory, LLM agents, Knowledge graphs, Stochastic modelling |
Project Listed | 15 January 2025 |
Project Status | Open |
Contact Name | Uday Kiran |
Contact Email | kirannu@amazon.com |
Company/Lab/Department | Amazon Lab126 |
Address | One Station Square, Cambridge, CB1 2GA |
Project Duration | 8 weeks; full-time |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information | The prisoner's dilemma (PD) is a game theory paradox that illustrates how two rational individuals acting in their own self-interest can lead to a suboptimal outcome for the group. It's a thought experiment where each individual can choose to cooperate with their partner for mutual benefit or betray them for personal gain. The dilemma arises because while it's rational for each individual to defect, cooperation would result in a higher payoff for both. This project tries to model rational individuals with per-biased LLM agents to make problem more realistic to real world. |
Project Description |
The multiplayer prisoner's dilemma The multiplayer prisoner's dilemma, also known as the n-person prisoner's dilemma (NPD), is a game theory scenario where multiple players must choose between cooperating or defecting:
The outcome for each player depends on their choice and the choices of all other players. If everyone chooses to defect, the outcome is worse for everyone than if they had cooperated. The NPD became popular in the 1970s among economists and social theorists. It can be used to model many real-world social, political, and economic problems. For example, the tragedy of the commons is a multiplayer generalization of the prisoner's dilemma. In this scenario, villagers must choose between personal gain or restraint. If they all choose to defect, the commons are destroyed. PD problem for LLMs and knowledge graphs: This section outlines approach to redesigning the Prisoner's Dilemma (PD) problem using LLM agents with personalized characteristics. Here's a high-level outline of how you could implement this simulation: 1. LLM-based Agents:
2. Knowledge Graph Representation:
3. Personality Trait Assignment:
4. Prisoner's Dilemma Simulation:
5. Multi-group Simulation:
Key Considerations:
Problems to solve for successful completion of the project: Competitive pricing in the marketplace: 1. Modeling the Marketplace Dynamics:
2. Modeling Seller Behavior:
3. Modeling Buyer Behavior:
4. Incorporating the Tragedy of the Commons:
5. Leveraging Multi-LLM Agents:
By incorporating these advanced techniques, students can gain a deeper understanding of the complex dynamics and decision-making processes involved in competitive pricing within a marketplace. This exercise can help them develop skills in multi-agent modeling, stochastic optimization, and the application of knowledge graphs and LLMs to complex real-world problems |
Work Environment |
The student will work independently on this project, with myself serving as the industrial supervisor. I will provide regular guidance and mentorship, helping the student define goals, troubleshoot challenges, and refine their approach throughout the project. Although the student will primarily work on their own, I will be readily available for discussions and feedback through scheduled meetings and as needed via email or video calls. The student will have the flexibility to work remotely, allowing them to structure their schedule to maximize productivity. There are no fixed office or lab hours, but the student is encouraged to maintain consistent progress and attend periodic check-ins to review milestones and ensure alignment with the project goals. Day-to-day, the student will engage in tasks such as implementing and testing models, analyzing results, and documenting findings. They will have access to tools, datasets, and resources necessary for the project, along with my guidance to navigate technical or conceptual challenges. This setup offers the student a hands-on, immersive experience while fostering independence and problem-solving skills. |
References | |
Prerequisite Skills | Statistics, Probability/Markov Chains, Mathematical Analysis |
Other Skills Used in the Project | Predictive Modelling, Database Queries, Data Visualization |
Acceptable Programming Languages | Python, R |
Hallmarks of cancer regression
Project Title | Hallmarks of cancer regression |
Keywords | predictive biomarkers, multimodal data, hierarchical regression, Hallmarks of cancer |
Project Listed | 15 January 2025 |
Project Status | Filled |
Contact Name | Fabio Rigat |
Contact Email | fabio.rigat@astrazeneca.com |
Company/Lab/Department | AstraZeneca PLC |
Address | 36 Hills Road, Cambridge, CB2 8PA |
Project Duration | 8 weeks full time, ideally btw June 2nd and July 31st |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information | In oncology, molecular features of prognostic or predictive value are key to matching patients with effective investigational treatment strategies. These features range from a small number of well understood markers, including expression of drug targets and molecular characterisations of the tumour microenvironment, up to high dimensional multi-modal data including genetic variants, gene expression and protein expression. When high dimensional molecular disease features are used, it is challenging to derive robust features providing accurate prediction of response to therapy in validation samples. |
Project Description | This project will focus in on assessment of a novel supervised learning methodology estimating low dimensional predictive markers by combining high dimensional disease molecular characteristics and established gene annotation systems based on the Hallmarks of Cancer. This assessment will include running computer simulations estimating true & false positive outcome probabilities under selected scenarios, application of the method to re-analysis of published datasets and applications to the method to exploratory analyses of internal unpublished data. The outcome of this project will be integrated with ongoing work to provide material towards a publication. |
Work Environment | Student will be working within the AstraZeneca Biometrics environment, supported specifically by members of the Statistical Innovations organisation. |
References | 1. Douglas Hanahan and Robert A. Weinberg (2011) Hallmarks of Cancer: The Next Generation, Cell, DOI 10.1016/j.cell.2011.02.013 2. Ádám Nagy, Gyöngyi Munkácsy, Balázs Győrffy (2021) Pancancer survival analysis of cancer hallmark genes, https://doi.org/10.1038/s41598-021-84787-5 3. Otília Menyhart, William Jayasekara Kothalawala, Balázs Győrffy (2024) A gene set enrichment analysis for the cancer hallmarks, https://doi.org/10.1016/j.jpha.2024.101065 4. Francesco C Stingo, Yian A Chen, Mahlet G Tadesse, Marina Vannucci (2011) Incorporating biological information into linear models: a bayesian approach to the selection of pathways and genes, https://doi.org/10.1214/11-AOAS463 |
Prerequisite Skills | Statistics, Simulation, Predictive Modelling, Data Visualization, Effective collaboration skills |
Other Skills Used in the Project | Interest in oncology |
Acceptable Programming Languages | Python, MATLAB, R |
Investigation of dopant activation and diffusion in SiC
Project Title | Investigation of dopant activation and diffusion in SiC |
Keywords | tcad, modeling, activation, diffusion, SiC |
Project Listed | 20 January 2025 |
Project Status | Open |
Contact Name | Alexandros Kyrtsos |
Contact Email | alexandros.kyrtsos@silvaco.com |
Company/Lab/Department | Silvaco, Process Engineering Team |
Address | Silvaco Europe Ltd. 5, Compass Point, St Ives, PE27 5JL |
Project Duration | 8 weeks, full time |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information | Silvaco is a global leader in electronic design automation (EDA) software and technology computer-aided design (TCAD) solutions. Our cutting-edge tools empower semiconductor companies to design, simulate, and optimize next-generation devices and processes. |
Project Description |
As a TCAD intern focusing on process simulation, you’ll work alongside experts to investigate dopant activation and diffusion in SiC-4H, developing and enhancing models for these critical semiconductor processes. This is an opportunity to gain hands-on experience, contribute to advanced research, and be part of the innovation that drives the future of semiconductor technology. The project involves literature search and research on the matter of activation and diffusion of various dopants in SiC-4H. Furthermore, it involves the development and validation of models to simulate these processes. The student will have the opportunity to enhance and develop skills such as data analysis and visualization, development of physical models, simulation techniques, programming. |
Work Environment | Hybrid (mix of on-site and remote work). High degree of independent work is required. |
References | https://www.iue.tuwien.ac.at/phd/simonka/index.html, chapter 3 |
Prerequisite Skills | Mathematical physics, Simulation, Data Visualization |
Other Skills Used in the Project | Simulation, Predictive Modelling, Data Visualization |
Acceptable Programming Languages | Python, MATLAB, C++ |
Discrete Representations of Continuous Probability Distributions
Project Title | Discrete Representations of Continuous Probability Distributions |
Keywords | Distributions, Probability, Representations, Statistics |
Project Listed | 20 January 2025 |
Project Status | Open |
Contact Name | Laurence Weir |
Contact Email | careers@signaloid.com |
Company/Lab/Department | Signaloid Ltd |
Address | 4 Station Square, Cambridge, CB1 2GE |
Project Duration | 8 weeks, full time |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information | Probability distributions provide a mathematical framework for understanding and modelling uncertainty, allowing us to quantify the likelihood of different outcomes in random processes. By characterising how data is distributed, they enable informed decision-making and are foundational to fields like statistics, machine learning, and risk assessment. Many of these distributions, such as the famous normal distribution (bell curve), are defined continuously, but in reality we need to represent these distributions with a finite number of discrete points so that we may perform statistical tasks quickly and efficiently on a computer. |
Project Description | In this project you will be working on new discrete representations of probability distributions to try and uncover better ways to capture the shape and form of many theoretical and real world distributions. First you will learn about distributions as a rigorous mathematical object and how you can perform arithmetic on them. You will also learn how we quantify the "closeness" of distributions using distance metrics and criteria. Then after researching existing methods to represent distributions discretely, you will get to try and conceive of new and improved methods. Finally, you will test and verify these methods both analytically and numerically through simulations (in Python or a similar language). |
Work Environment | Join a remote team of industry mathematicians discussing probability theory and real world statistical problems. You will have the chance to talk with your supervisor multiple times per week and have them guide you through the project and oversee your progress. |
References | https://link.springer.com/article/10.1007/s00362-022-01356-2 |
Prerequisite Skills | Statistics, Probability/Markov Chains, Simulation |
Other Skills Used in the Project | Mathematical Analysis, Data Visualization, Metric Spaces |
Acceptable Programming Languages | Python |
Finite Difference Approximation of Multiphase Stokes Flow with Free Interfaces on Staggered Cartesian Grids
Project Title | Finite Difference Approximation of Multiphase Stokes Flow with Free Interfaces on Staggered Cartesian Grids |
Keywords | Multiphase Stokes Flow, Finite-Difference Methods, PDE, Applied Linear Algebra |
Project Listed | 24 January 2025 |
Project Status | Open |
Contact Name | Vasily Suvorov |
Contact Email | vasily.suvorov@silvaco.com |
Company/Lab/Department | Silvaco Europe, TCAD |
Address | Silvaco Technology Centre Compass Point St Ives, Cambridgeshire, United Kingdom PE27 5JL |
Project Duration | 8 weeks, 40 hours/week |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information | A modern semiconductor technology involves processes where materials with free interfaces undergo large and slow deformations. Such deformations can often be modelled by the incompressible Stokes flow. The project aims to analyse the company’s working numerical approach to model such flow with the aim of improving accuracy, stability and convergence. |
Project Description |
Silvaco uses finite-difference schemes on structured 2D and 3D Cartesian grids to simulate multiphase Stokes flow with free interfaces. A particular challenge in applying such schemes lies in the accurate approximation of boundary conditions at the interfaces between two different viscous liquids and in the approximation of momentum equations across these interfaces. The student will assist in analyzing and improving the approximation and stability of the current numerical schemes, with the possibility of proposing better alternatives. Special attention will be given to developing numerical schemes that are well-suited for iterative methods such as BICGSTAB. The resulting matrices will be analysed using SVD and QR factorization, and other appropriate techniques from Numerical Linear Algebra. |
Work Environment | The student will work on his/her own with the support and guidance from the supervisor. |
References | |
Prerequisite Skills | Numerical Analysis, PDEs, Algebra/Number Theory |
Other Skills Used in the Project | |
Acceptable Programming Languages | Python, MATLAB |
Exploring the use of Generative Adversarial Networks for synthetic data generation
Project Title | Exploring the use of Generative Adversarial Networks for synthetic data generation |
Keywords | Generative adversarial networks (GANs), Synthetic data, Toxicology, Neural networks, Applied scientific computing |
Project Listed | 27 Jan 2025 |
Project Status | Open |
Contact Name | Patrik Engi |
Contact Email | patrik.engi@unilever.com |
Company/Lab/Department | Unilever SERS |
Address | Colworth Science Park, Sharnbrook, Bedford MK44 1LQ |
Project Duration | 8-12 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information |
In fast-moving consumer goods, it is vital that safety risk assessments are conducted to ensure products are safe for humans and the environment. Historically, these risk assessments have relied on the use of in vivo animal testing to identify detrimental impacts of chemicals on organisms. However, from a scientific, ethical, and legislative perspective, more recently developed non-animal methods are the preferred approach. For more than 20 years, Unilever’s Safety, Environmental and Regulatory Science (SERS) has been developing novel in silico and in vitro based methods, which leverage recent advances in biology, genetics, computing, mathematics and statistics, to conduct safety assessments without the use of animal testing. [1, 2]. This evolution in the risk assessment paradigm presents new opportunities in terms of applying new deep learning and AI-based approaches. A key risk assessment step is to characterise the potential effects that a chemical may have on different cell types. This typically involves using high throughput transcriptomics (HTTr) to measure the genetic response of cells to different concentrations of a test chemical. Such data can be expensive to generate, particularly if it needs to be generated for multiple chemicals and cell types. Furthermore, it is common to encounter situations where not all the necessary data for a risk assessment is readily available. Therefore, the use of approaches that maximise the utility of the available data is a high priority. Recent advances in deep-learning and AI may provide a way to generate so-called synthetic data. These could be used either to fill data gaps within a risk assessment or make predictions on the effects a chemical might cause at a gene transcriptional level. This project would focus on exploring the utility presented by Generative Adversarial Networks in this application. |
Project Description |
GANs generate synthetic data through two competing neural networks, a generator and discriminator, engaging in a zero-sum game. This project will therefore provide the opportunity to apply and expand existing knowledge in various areas, such as statistics, probabilistic machine learning, and game theory, while also building skills and experience in applied scientific computing. We suggest that the student(s) approach the topic as an open-ended research project, focusing on recent developments using GANs in in vitro Toxicology [3]. We would like the student to demonstrate and develop from the existing science by:
This phase of the project would involve the student familiarizing themselves with the current state-of-the-art regarding the application of GANs in toxicology, guided by the available literature as in Refs. [3-5]. Once achieved, we would like the student to advance this field of application by:
Throughout the project, the student will have opportunity to meet with experts from various mathematical backgrounds, as well as collaborate with other disciplines such as toxicology, human biology, and risk assessment. |
Work Environment | Student will mostly work independently, but will have the full support of a wider team + supervisors for questions, guidance and advice. We expect the student will be working remotely, but visiting/attendance to site (Sharnbrook, near Bedford) is encouraged if travel permits. |
References | [1] J. Reynolds, S. Malcomber and A. White, “A Bayesian approach for inferring global points of departure from transcriptomics data,” Computational Toxicology, vol. 16, p. 100138, November 2020. [2] T. E. Moxon, H. Li, M.-Y. Lee, P. Piechota, B. Nicol, J. Pickles, R. Pendlington, I. Sorrell and M. T. Baltazar, “Application of physiologically based kinetic (PBK) modelling in the next generation risk assessment of dermally applied consumer products,” Toxicology in Vitro, vol. 63, p. 104746, March 2020. [3] Chen X, Roberts R, Tong W, Liu Z. Tox-GAN: An Artificial Intelligence Approach Alternative to Animal Studies-A Case Study With Toxicogenomics. Toxicol Sci. 2022 Mar 28;186(2):242-259. doi: 10.1093/toxsci/kfab157. PMID: 34971401. [4] Chen, X., Roberts, R., Liu, Z. et al. A generative adversarial network model alternative to animal studies for clinical pathology assessment. Nat Commun 14, 7141 (2023). https://doi.org/10.1038/s41467-023-42933-9 [5] Lee, M. Recent Advances in Generative Adversarial Networks for Gene Expression Data: A Comprehensive Review. Mathematics 2023, 11, 3055. https://doi.org/10.3390/math11143055 |
Prerequisite Skills | Statistics |
Other Skills Used in the Project | Predictive Modelling, Data Visualization, Deep learning |
Acceptable Programming Languages | No Preference |
Bucketed interest rate risk
Project Title | Bucketed interest rate risk |
Keywords | Financial Mathematics, Interest Rates, Risk Management |
Project Listed | 30 January 2025 |
Project Status | Closed |
Contact Name | Chris Hunter, Jennifer Shaeffer |
Contact Email | jshaeffer@pharo.com |
Company/Lab/Department | Pharo Management |
Address | 8 Lancelot Place, London, SW7 1DR |
Project Duration | 8 weeks, full time |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information |
Pharo Management is a leading global macro hedge fund manager with a focus on emerging markets. Founded in 2000, the firm has offices in London, New York and Hong Kong and currently manages approximately $7 billion in assets across four funds. Pharo trades foreign exchange, sovereign and corporate credit, local market interest rates, commodities, and their derivatives. We trade in over 70 countries across Asia, Central and Eastern Europe, the Middle East and Africa, Latin America as well as developed markets. Our investment approach combines macroeconomic fundamental research and quantitative analysis. Pharo employs a diverse, dynamic team of over 125 professionals representing over 20 nationalities and 30 languages. We have a strong corporate culture anchored in core values such as collaborative spirit, creativity, and respect. We are passionate about what we do and are committed to attracting the best and brightest talent. |
Project Description |
Expected Outcomes By the end of the internship, the intern will have developed a clear understanding of risk transformations in interest rate modeling, implemented practical computation methods for Jacobian-based transformations, and potentially explored advanced techniques using algorithmic differentiation. The project will contribute to more efficient and accurate risk management methodologies in fixed-income markets. Project Overview This internship project focuses on the transformation of bucketed interest rate risk using Jacobian matrices and, if time permits, the computation of bucketed risk using algorithmic differentiation (AD). The goal is to enhance methodologies for understanding and managing interest rate sensitivities in financial models. Interest rate risk is commonly analyzed by measuring sensitivities to shifts in specific maturity buckets (e.g., 1Y, 5Y, 10Y). However, for risk aggregation, stress testing, and hedging, these bucketed sensitivities must often be transformed into different risk bases, such as principal component decompositions or forward-rate perturbations (for example, 1Y1Y, 2Y3Y and 5Y5Y). This transformation is mathematically represented as a Jacobian matrix operation, which maps one set of risk factors to another while preserving sensitivity structure. Algorithmic Differentiation (AD) is a computational technique used to efficiently compute derivatives of functions expressed as computer programs. Unlike symbolic differentiation, which can lead to expression swell, or numerical differentiation, which suffers from truncation and rounding errors, AD systematically applies the chain rule of differentiation at the elementary operation level, allowing for highly accurate and efficient gradient computations. AD is particularly useful in financial applications such as risk management and derivatives pricing, where sensitivity analysis and risk calculations must be performed quickly and accurately. Key Objectives 1. Jacobian Risk Transformations
2. Algorithmic Differentiation for Risk Computation
Skills & Technologies
|
Work Environment | Ideally the student will work at the Pharo office (central London), supported by myself and other members of the Quantitative Analytics team. Remote working would be considered. |
References |
For an introduction to risk transformations using Jacobian matrices, refer to: For an introduction to algorithmic differentiation, refer to: |
Prerequisite Skills | Python, Linear Algebra |
Acceptable Programming Languages | Python |
Machine learning on multimodal and unstructured data for healthcare applications
Project Title | Machine learning on multimodal and unstructured data for healthcare applications |
Keywords | AI, machine learning, healthcare, multimodal data |
Project Listed | 4 February 2025 |
Project Status | Open |
Contact Name | Sam Genway |
Contact Email | sam.genway@lifearc.org |
Company/Lab/Department | LifeArc |
Address | Accelerator Building, Open Innovation Campus, Stevenage, SG1 2FX |
Project Duration | 9 weeks, full-time - starting 30 June 2025 |
Project Open to | Undergraduates, Master's (Part III) students |
Background Information |
At LifeArc, our ambition is to make life science life changing. We do this by advancing scientific discoveries beyond the lab, faster, so that they can shape the next generation of diagnostics, treatments, and cures. Working at the cutting edge of translational science and as the early-stage translation specialists, we progress scientific discoveries on their journey to becoming a medicine, diagnostic or intervention that can improve patients’ lives. Our work begins by seeking out innovative science, then helping to develop this to a point where there is a clinical and commercial pathway for others to invest the time and money to take it further forward. Data Sciences is an integral part of LifeArc’s Science organisation. We work with our laboratory-based projects to analyse, visualise and interpret data in order to design future experiments; we build computational models to make predictions, often using the latest AI and machine learning methods; we develop computational workflows and write software; we work closely with LifeArc colleagues and with external collaborators in multiple project teams. Our methods are applied to tackle problems in chemistry, biology and medical/clinical science. What we can offer you: Join us, and you’ll have the scope to be creative and take measured risks. You’ll be rewarded for your curiosity, for working as one team, and for learning fast. And you’ll have everything you need to be your best every day. We all have potential. At LifeArc, you’ll discover what you can really do with it. |
Project Description |
Job Title: Summer Placement Student (Data Sciences) At LifeArc, we want to hear from people who are as passionate as we are about making life science life changing that can improve patients’ lives. A bit about the role: This is an opportunity to get involved in exciting work within the Data Sciences team at our state-of-the-art facilities in Stevenage. LifeArc is a self-funded not-for-profit organisation with a mission to impact patient unmet needs. Artificial Intelligence (AI) brings new paths to patient impact through the development and translation of healthcare AI applications. However, several challenges exist in the application of machine learning methods to real-world challenges, such as predicting patients at risk of disease, or informing a diagnosis or prognosis. Particular challenges include leveraging multimodal data available in patient cohorts during training to create impactful models which provide utility when modalities are not available at inference time. Another challenge is in the formulation of machine learning methods for time-to-event predictions leveraging unstructured datasets. In each case, there are multiple approaches which could be valuable, and the aim of this project is to compare and identify those with real-world utility. Project The project will focus on one or two of the following challenges:
About you: Education & experience required:
Skills and Strengths, we are looking for during the recruitment and selection process:
You are not expected to have a deep background in life sciences or healthcare. We want to hear from students who are passionate about the application of machine learning methods for real world impact in healthcare, with experience in predictive modelling and hands-on programming experience in python. Candidates must have the right to work in the UK. Application Process: Applications are open from 5 February 2025. As part of the application process, please send the following via email to Sam Genway (Scientific Director, Machine Learning and AI) at Sam.Genway@lifearc.org:
Application Closing: 28 February 2025. If your application is successful, you will be invited to a final stage virtual interview. Full instructions and guidance on how to approach and prepare for the assessment will be provided. We are also proud to be using Rare Recruitment's Contextual Recruitment System (CRS) which allows us to consider your achievements in the context in which they were gained. We understand that not all candidate’s achievements look the same on paper – and we want to recruit the best people, from every background. We would therefore encourage you to submit your contextualised data using the Rare Contextual Recruitment System as part of your application using this link: https://lifearc.app.contextualrecruitment.com/apply/cf4cc979-16f6-435b-922d-ca52259fb839 |
Work Environment | The student will work at our Stevenage site on their own project, but with regular supervision and with other members of the data sciences group available to talk to about the project. |
References | |
Prerequisite Skills | Statistics, Predictive Modelling |
Other Skills Used in the Project | Image processing |
Acceptable Programming Languages | Python |