List of all projects with keywords (click link for full listing)
- Spectral estimation for irregularly-sampled complex-valued time series
Keywords: Spectral estimation, irregular sampling, complex-valued, time series
- Capturing information from operating theatres
Keywords: medical, data, encryption, QR-codes, implant
- Optimizing deep neural networks for speech processing application: a parametric approach
Keywords: Deep learning, optimization, speech processing, parametric, sparsity
- Advanced image analytics for drug discovery
Keywords: Image processing, Computer vision, Machine Learning, Bioimaging, Pharmaceutical industry
- Meta-Analysis of Transcriptomics data at GSK
Keywords: transcriptomics, genomics, RNA-seq, meta-analysis, statistics
- Inhalation Dosimetry Modelling
Keywords: CFD, Statistics, Modelling, Toxicology, Risk
- Verification of stress simulation model/software
Keywords: Stress simulation, Mathematical modelling, Model verification, Numerical analysis
- Projects in Quantitative/Systematic investing
Keywords: Finance, Data analysis, Python, Scientific approach
- Low-rank matrix approximations within Kernel Methods
Keywords: machine learning, linear algebra, mathematical statistics
- Prize pool and odds forecast
Keywords: combinatorics, probability, markov chain, monte carlo
- Card Gaming AI
Keywords: combinatorics, probability, neural network, simulations
- Using mathematical techniques to assist in the continuous improvement process of a cut flower manufacturing operation
Keywords: Fresh Produce, Cut flowers, Technical, Quality, Data analysis, Statistics
- Quantum computing internship
Keywords: quantum, algorithms, software
- Pattern finding in industrial data
Keywords: Pattern recognition, Shape description, Data Grouping, Algorithm, Comparison
- Analytical solutions for use of varistors in superconducting magnet quench protection
Keywords: Varistor, Superconducting magnet, Protection circuit, Quench, Analytical solution
- Modelling and Numerical Simulation of Stress Dependent Oxidation of Silicon
Keywords: Oxidation, TCAD, Mathematical modelling, Numerical Algorithms, C++ coding
- Deep representation learning for health records: identifying patients with similar interactions with health services
Keywords: Multimorbidity, deep learning, neural networks, artificial intelligence, healthcare data, health data science
- Analytical Solution for Multi-Barrier Release, Mechanically Link Diffusion to In-vitro Release (project withdrawn)
Keywords: Fick's law of diffusion, Differential equations, Analytical and numerical solutions
- Multi-scale modeling to enable data-driven biomarker and target discovery
Keywords: Drug Discovery, Data Science, Machine Learning, Bioinformatics, Precision Medicine
- TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials
Keywords: Graph modeling, Data integration, Data Science, Clinical Trials, Machine Learning
- Network reconstruction from single cell transcriptomic data
Keywords: Network inference, single cell transcriptomics, computational biology, statistics, R
- Algorithm development and modelling for security applications
Keywords: Security, machine learning, algorithms
- Developing an approach for biotherapeutic purity quantitation from analytical instrument signals
Keywords: Modelling, Visualization, Signals, Scripting, Pharmaceuticals
- Is Quantum Machine Learning mature for clinical applications?
Keywords: Quantum Computing, Quantum Machine Learning, Pharma, Clinical, AI
- Aggregating embeddings in deep unsupervised graph learning
Keywords: graphs, unsupervised learning, deep learning, AI, pathology
- Predicting the pick-up weight of chocolate from real-time factory data
Keywords: Ice Cream, Chocolate, Modelling, Machine-Learning, Python
- Early Stage Investing: Model Development for The Identification of Investable Technologies and Industries
Keywords: Statistics, Predictive Modelling, Database Queries, Machine Learning
- Modelling optionality in inflation linked securities
Keywords: inflation, derivatives, options, bonds
- Modelling inflation expectations in financial markets (project withdrawn)
Keywords: inflation, modelling, machine learning
- State of the art in Covariance matrix estimation and filtering for Risk assessment
Keywords: covariance estimation, risk, algorithms
- Fuzzy matching algorithm for live trade populations
Keywords: algorithms, fuzzy matching, trade reconciliation
- Solvers for Integer Quadratic Program ("IQP") problems related to allocating trades
Keywords: integer quadratic programming, algorithms, trade allocation
- Neural Network Model Calibration
Keywords: financial engineering, machine learning
- Segmenting duodenal biopsy images
Keywords: Deep learning, neural networks, image analysis, digital pathology, coeliac disease
Spectral estimation for irregularly-sampled complex-valued time series
Project Title | Spectral estimation for irregularly-sampled complex-valued time series |
Contact Name | Keith Briggs |
Contact Email | keith.briggs@bt.com |
Company/Lab/Department | BT Labs Wireless Research |
Address | Adastral Park, Martlesham Heath, Ipswich IP5 3RE |
Period of the Project | 8 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | In the modelling of 5G radio channels, we take measurement of complex time series (which are channel matrix elements), but the measurement process unavoidably takes these samples at irregular (but known) times. We wish to explore methods for estimation of power spectral density (PSD) for such data and to understand better their sampling properties. The overall application area is the better estimation of channel matrices, in order to improve performance of 5G radio systems. The channels are described by matrices because the systems are MIMO (multi-in multi-out), effectively using a vector channel. Also of interest is estimation of autocorrelation of these matrices, and PSD has been viewed as a step towards this. The whole topic fits well under the harmonic analysis heading in the CMI mission statement. |
Brief Description of the Project |
To estimate power spectral density from irregularly-sampled complex data, we are currently using a kind of generalized Lomb-Scargle periodogram (LSP). However, the theory and sampling properties of this estimator are not well understood. Appropriate theoretical background is available in Percival & Walden, Spectral analysis for univariate time series (CUP 2020), p.528ff. This project could tackle one or more of these items: 1. The LSP can be viewed as a generalized discrete Fourier transform (DFT), in other words a matrix-vector product in which row-columm dot product is the projection of the data onto a basis vector of the model. In the LSP the matrix elements do not have as many nice properties as the DFT matrix. We can speak of the Lomb-Scargle Transform (LST), of which the LSP is simply the modulus squared. |
Keywords | Spectral estimation, irregular sampling, complex-valued, time series |
References | Percival & Walden, Spectral analysis for univariate time series (CUP 2020), p.528ff. |
Prerequisite Skills | Statistics, Probability/Markov Chains, Simulation |
Other Skills Used in the Project | Statistics, Probability/Markov Chains, Simulation |
Programming Languages | Python, C++ |
Work Environment | Mostly working with me, with a wider team available if needed. Flexible hours, on-site preferred but remote possible. |
Capturing information from operating theatres
Project Title | Capturing information from operating theatres |
Contact Name | Rosemarie Gant / Kim Whittlestone |
Contact Email | admin@healthdatainsight.org.uk |
Company/Lab/Department | Health Data Insight CIC |
Address | CPC4 Capital Park, Fulbourn, Cambridge CB21 5XE |
Period of the Project | 8-12 weeks starting 28th June 2021 |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 19th February 2021 |
Background Information | A recent review by Baroness Cumberlege (https://www.immdsreview.org.uk/Report.html) into the complications that can occur following implanting medical devices has recommended that data on all implanted medical devices should be collected at the time of the operation. Device manufacturers are required to put unique bar or QR codes on every device and it will soon become a requirement for hospitals to record this information on all devices implanted into patients so that patients can be followed up months or years later. The National Theatre Dataset is currently being developed to hold this information so we know about all procedures carried out in UK hospital theatres. |
Brief Description of the Project |
This year's project aims to develop a simple hardware and software solution to capture the information from operating theatres about implanted medical devices, operations and procedures. The intention is to create a low-cost solution that could be used by the NHS to help to gather data into the new National Theatre Dataset. The project will be run by Health Data Insight CIC and we will be able to draw on close links with expert colleagues working in the NHS and NHS Digital. We are offering up to six intern places on this group project in 2021. Interns will work together as a multi-disciplinary team, bringing a diverse range of skills and experience to develop the project, from specification to final completion. This project has a number of components: Who are we looking for? What skills/experience do I need? How do I apply? What is the deadline for applications? |
Keywords | medical, data, encryption, QR-codes, implants |
References | If you would like to see what our team got up to last year: https://healthdatainsight.org.uk/running-an-intern-scheme-in-a-global-pa... and https://healthdatainsight.org.uk/project/syndasera/ |
Prerequisite Skills | Enthusiasm and eagerness to work as part of a small team of like-minded individuals are the main attributes we are after. |
Other Skills Used in the Project | |
Programming Languages | No Preference |
Work Environment | As well as being a team project, this internship is a chance to join a thriving and enthusiastic community of bright individuals (see https://healthdatainsight.org.uk/category/team/). The team will be supported by an Intern Team Lead, with specialist input from developers, project managers, analysts, science communicators and many other professionals. This internship is about developing specialist skills and also a chance to enhance your communication, collaboration, organisational and team-working skills. The normal working week is 37.5 hours; we offer a salary of £1,500 per month, flexible working and 2.5 days leave per month. Interns will meet regularly to discuss their progress on the project and the Intern Team Lead will always be available either in person or online for queries and support. If permitted by COVID rules, the interns will work in the HDI offices in Capital Park, Fulbourn, Cambridge although travel to other sites may be necessary as part of the internship. If remote working is necessary, we have the setup required to do this. |
Optimizing deep neural networks for speech processing application: a parametric approach
Project Title | Optimizing deep neural networks for speech processing application: a parametric approach |
Contact Name | Cong-Thanh Do |
Contact Email | cong-thanh.do@crl.toshiba.co.uk |
Company/Lab/Department | Toshiba Europe Limited |
Address | Toshiba Europe Limited, 208 Cambridge Science Park, Milton Road, Cambridge CB4 0GZ |
Period of the Project | 8-12 weeks between late June and September |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information |
Nowadays deep neural networks (DNNs) are widely used in speech processing applications, from hearing aids to personal assistants on mobiles phones. Conventional wisdom tells that the deeper and wider the neural network models are, the higher performance the system can achieve, and this is generally true. However, the DNNs' complex architecture can be effectively optimized in order to improve the performance as well as reducing the number of parameters. Indeed, larger models cannot be implemented in hardware with limited memory and computational power. Therefore, optimizing the structure of DNNs is an active research direction. Various methods have been proposed to optimize DNN architecture and for model size reduction, such as pruning redundant models or exploiting the sparsity of rectifier activation function (ReLU) to reduce the computational load of convolution [1]. The ReLU activation function enables a network to easily obtain sparse representations [2]. |
Brief Description of the Project |
Achieving sparsity in neural networks is one of the necessary conditions that the models can be reduced in size while maintaining the same level of performance by eliminating the zero weights after training the DNNs [1]. Sparsity in DNNs can be achieved by various approaches, for instance by using sparse evolutionary training (SET) algorithm [3]. In this project, we will investigate the way to achieve sparsity by studying activation functions for DNNs. More specifically, we study the use of splines in the activation functions of a deep neural networks. Spline-related parametric models were studied to optimize the shape of neural activation units [4, 5, 6]. The use of parametric splines makes it possible to establish a direct connection between training DNNs and activation functions and the resulting sparsity. In [5], the author showed that the optimal network configuration can be achieved with activation functions that are nonuniform linear splines with adaptive knots. The study’s significance is that the action of each neuron is encoded by a spline whose parameters are optimized during the training procedure. The proposal resulted in a computational structure that is compatible with deep-ReLU, parametric ReLU [7], and MaxOut structure. In our work, we will focus on using sparsity as one of the constraints for training DNNs with parametric activation function, in particular using the spline model proposed in [6]. Achieving sparsity could result in improved performance and sparse weights which is useful for performance improvement and model size reduction. |
Keywords | Deep learning, optimization, speech processing, parametric, sparsity |
References | [1] A. Yaguchi, T. Suzuki, W. Asano, S. Nitta, Y. Sakata, A. Tanizawa, "Adam induces implicit weights sparsity in rectifier neural networks", in Proc. 17th IEEE International Conference on Machine Learning and Applications, pp. 318-325, Dec. 2018. [2] X. Grolot, A. Bordes, Y. Bengio, "Deep sparse rectifier neural networks", in Proc. 14th Conference on Artificial Intelligence and Statistics (AISTATS), pp. 315-323, Apr. 2011. [3] D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, A. Liotta, "Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science", Nature Communications, 2018. [4] F. Agostinelli, M. Hoffman, P. Sadowski, P. Baldi, "Learning activation functions to improve deep neural networks", in Proc. International Conference on Leaning Representation (ICLR), 2015. [5] L. Vecci, F. Piazza, A. Uncini, "Learning and approximation capabilities of adaptive spline activation function neural networks", Neural Networks, vol. 11, no. 2, pp. 259-270, 1998. [6] M. Unser, "A representer theorem for deep neural networks", Journal of Machine Learning Research, vol. 20, pp. 1-30, 2019. [7] K. He, X. Zhang, S. Ren, J. Sun, "Delving deep into rectifiers: surpassing human-level performance on ImageNet classification", in Proc. International Conference on Computer Vision (ICCV), pp. 1026-1034, Dec. 2015. |
Prerequisite Skills | Statistics, Probability/Markov Chains, Simulation |
Other Skills Used in the Project | Numerical Analysis, Image processing |
Programming Languages | Python, MATLAB, C++, No Preference |
Work Environment |
The student will work in a team. Besides the main supervisor (Dr. Cong-Thanh Do), there will be a co-supervisor (Dr. Rama Doddipatla) to which the student can talk to about the project. The working hours of the lab are 9am-5pm. Given the current situation regarding the coronavirus, working remotely is acceptable. Access to the office could be considered if necessary and according to the situation. |
Advanced image analytics for drug discovery
Project Title | Advanced image analytics for drug discovery |
Contact Name | Sara Schmidt |
Contact Email | sara.x.schmidt@gsk.com |
Company/Lab/Department | GSK |
Address | Gunnels Wood Road, Stevenage SG1 2NY |
Period of the Project | 8-10 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information |
GSK is a FTSE100, science-led, global healthcare business currently ranked as the leading pharmaceutical company in the UK. Our research is focused on immunology efforts including small molecules, biologics and vaccines. Never has it been more important for us to reduce the time it takes from identification of a potential therapeutic to a marketed medicine, that is the main focus of this project. At GSK we have created a world-leading data and computational environment to enable large scale scientific experiments that exploit GSK's unique access to data. Our focus is on bringing data, analytics, and science together into solutions for our scientists to develop medicines for patients. A key enabler of this effort is the ability to extract knowledge from imaging data. A specific challenge for the early phase of drug discovery programmes is to assign potential drug molecules into those with a desired effect of the drug target in mind and those with an undesirable effect, e.g. toxicity. One way to achieve this goal is by developing advanced image analytics algorithms, where image sets of cells in the presence of molecules with known undesirable mechanisms are used to define image signatures, the so-called "ground truth". Thereafter the algorithm is applied to unknown compounds to allow us to focus on compounds that are free from potential liabilities, thereby improving drug failure rates and overall speed up the often lengthy and costly drug discovery process. |
Brief Description of the Project | We are looking for a student with a keen interest in image processing and computer vision that can use our in-house generated image stacks of cells from early drug discovery programmes and associated training sets to develop image analytics algorithms that enable compound mechanism classification. The project will involve both improving existing, and the development of new, image analytics algorithms in open source packages (e.g. Python, Cellprofiler and Ilastik). Upon choice and validation of a suitable algorithm the student should develop a robust pipeline that can be used by scientists to analyse their own data at scale, in a way that minimises data integrity risks. |
Keywords | Image processing, Computer vision, Machine Learning, Bioimaging, Pharmaceutical industry |
References | Bray MA et.al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc. 2016 Sep;11(9):1757-74. doi: 10.1038/nprot.2016.105. Epub 2016 Aug 25. PMID: 27560178; PMCID: PMC5223290. |
Prerequisite Skills | Image processing |
Other Skills Used in the Project | Statistics, Data Visualization |
Programming Languages | Python, R |
Work Environment | Fully embedded into a scientific department and part of a wider team interacting with data scientists and imaging experts. |
Meta-Analysis of Transcriptomics data at GSK
Project Title | Meta-Analysis of Transcriptomics data at GSK |
Contact Name | Giovanni Dall'Olio |
Contact Email | giovanni.m.dallolio@gsk.com |
Company/Lab/Department | GSK |
Address | Gunnels Wood Road, Stevenage SG1 2NY |
Period of the Project | 8 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information |
In recent years GSK has invested in the creation of a Data Lake, an infrastructure where all the data generated in the company is stored and made available. This has many advantages, as the data is not scattered across data silos, and it is generated using a standardized and controlled process. One component of this Data Lake infrastructure is the pipeline for the sequencing of genomics and transcriptomics data (RNA-Seq and other technologies). We have built a process to generate and curate this data using standard tools and parameters, generating a high-quality dataset from experiments executed from different departments in the company. The scope of the research project will be to develop methods for meta analysis of the genomics and transcriptomics data in this dataset, comparing experiments generated by different units and collaborators. |
Brief Description of the Project |
The student will explore methods for meta-analysis of sequencing data from different experiments. The desired outcome of the project will be a computational notebook or a script documenting recommendations for meta-analysis methods, taking in consideration existing literature, and using example data from our dataset. This project is relatively open-ended and the student will have space to explore different solutions, as well as working with a curated dataset. Knowledge of NGS is not required although some preliminary understanding may be useful. Preferred programming languages would be R and Python. |
Keywords | transcriptomics, genomics, RNA-seq, meta-analysis, statistics |
References |
- Leek et al, Nat Rev Gen 2010. Tackling the Widespread and Critical Impact of Batch Effects in High-Throughput Data |
Prerequisite Skills | Statistics |
Other Skills Used in the Project | Database Queries, Data Visualization |
Programming Languages | Python, R |
Work Environment | Work remotely, in a team. |
Inhalation Dosimetry Modelling
Project Title | Inhalation Dosimetry Modelling |
Contact Name | George Fitton |
Contact Email | george.fitton@unilever.com |
Company/Lab/Department | Unilever SEAC Computational Science |
Address | SEAC, Unilever, Colworth Science Park, Sharnbrook, Bedford MK44 1LQ |
Period of the Project | 8 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information |
In daily life we use consumer products: household cleaning products, anti-perspirants, hairsprays, etc., that produce unintentional exposure to chemicals. Unilever assures consumer safety by assessing the toxicity risk of every ingredient in every product. Current risk assessments use historical rodent studies [1]. Next Generation Risk Assessments (NGRAs) use New-Approach Methodologies (NAMs) that leverage advances in computing, genetics, and statistics – novel in silico and in vitro approaches that assess risk without testing on animals. Over the past few decades, Unilever’s Safety and Environmental Assessment Centre (SEAC) has worked closely with Industry, Academia, and Regulatory Bodies to develop a wide range of non-animal approaches using mathematical modelling [2], [3], cell culture-based experiments [4], [5], and omics for systemic and local toxicity [6]. The aim of this short project is to help advance current inhalation risk assessment methods. |
Brief Description of the Project |
The overall goal of any Next Generation Risk Assessment is to get from an in vitro Point of Departure (PoD) to a relevant in vivo exposure level – getting from exposures (or concentrations) of chemicals with no toxicological response in cell models to exposures with no toxicological response in consumers and vice versa. Obtaining an in vitro PoD from an inhaled in vivo exposure requires an Inhalation Dosimetry Model. Inhalation Dosimetry Models calculate the fraction of the total number of inhaled particles deposited in the lung – a mass per volume metric. The in vitro PoD is obtained by distributing the number of deposited particles over the surface of the lung [7]. Current industry standards use the free but closed-source Multiple-Path Particle Dosimetry (MPPD) Model [8]–[10] to calculate the deposition fraction. The MPPD Model is user friendly and well-tested. But its dynamics are based on laminar fluid flow in a pipe – the lungs are modelled as a series of branching connected tubes. This means a lot of complex phenomena: wall impacts, convection, turbulent mixing, etc. are absent from the model. The problem statement follows – Investigate and quantify in-vitro PoD uncertainty due to in-vivo modelling simplifications / approximations. Advances in computational power have meant that advanced fluid dynamics simulations can be performed on most computers. Consequently, the modelling approaches of previous generations need to be revised. A significant amount of effort has been invested in advanced aerosol modelling for pharmaceutical companies, with the results published in the public domain and the software released as open-source [11]. Given the current SoA we see 3 potential directions of investigation; the student is free to tackle the problem as he/she sees fit. 1) Data-driven Approach: review the results from the current State-of-Art (SoA) in aerosol deposition simulations and build a data model to estimate the uncertainty in deposition fraction estimates. Currently available data may differ in particle size, composition (dust vs droplet), breathing patterns (smoking versus unintentional inhalation), etc. Given the available data, the student is free to use a variety of statistical techniques to achieve the required goal. 2) Computational Approach: AeroSolved is an open-source computational fluid dynamics model based on the open-source OpenFoam code base [12]. Its features include simulations of mass, momentum (Navier-Stokes equations), and energy conservation equations; Multispecies formulation for gas (vapor), liquid (droplet) phases and solid particles; and advanced aerosol physics models. Benchmarking the MPPD Model against the Eulerian AeroSolved aerosol deposition Model will provide a numerical estimate of the deposition fraction uncertainty. 3) Analytical Approach: The Equations governing the laminar-flow transport of aerosols in the MPPD Model are known and can be compared to the more complex Navier-Stokes transport model. Using the characteristic scales of the problem, a scale analysis of the (non-linear) term difference and their corresponding transport equations will yield an upper bound on the uncertainty of the deposition fraction. More generally, derive a scaling Equation for the deposition fraction for laminar and turbulent fluid flow. The aim is to determine a numerical estimate of the uncertainty in Deposition Fraction calculations in Laminar Flow Deposition Models, if possible, by benchmarking against State-of-Art Deposition Models. |
Keywords | CFD, Statistics, Modelling, Toxicology, Risk |
References | [1] W. Steiling et al., "Principle considerations for the risk assessment of sprayed consumer products," Toxicol. Lett., vol. 227, no. 1, pp. 41-49, 2014. [2] J. Reynolds, S. Malcomber, and A. White, "A Bayesian approach for inferring global points of departure from transcriptomics data," Comput. Toxicol., p. 100138, 2020. [3] J. Reynolds et al., "Probabilistic prediction of human skin sensitiser potency for use in next generation risk assessment," Comput. Toxicol., vol. 9, p. 100138, 2020. [4] M. T. Baltazar et al., "A next generation risk assessment case study for coumarin in cosmetic products," Toxicol. Sci., 2020. [5] S. Hatherell et al., "Identifying and Characterizing Stress Pathways of Concern for Consumer Safety in Next-Generation Risk Assessment," Toxicol. Sci., 2020. [6] T. E. Moxon et al., "Application of physiologically based kinetic (PBK) modelling in the next generation risk assessment of dermally applied consumer products," Toxicol. Vitr., vol. 63, p. 104746, 2020. [7] S. Gangwal et al., "Informing selection of nanomaterial concentrations for ToxCast in vitro testing based on occupational exposure potential," Environ. Health Perspect., vol. 119, no. 11, pp. 1539-1546, 2011. [8] S. Anjilvel and B. Asgharian, "A multiple-path model of particle deposition in the rat lung," Fundam. Appl. Toxicol., vol. 28, no. 1, pp. 41-50, 1995. [9] O. T. Price, B. Asgharian, F. J. Miller, F. R. Cassee, and R. de Winter-Sorkina, "Multiple Path Particle Dosimetry model (MPPD v1. 0): A model for human and rat airway particle dosimetry," RIVM Rapp. 650010030, 2002. [10] F. J. Miller, B. Asgharian, J. D. Schroeter, and O. Price, "Improvements and additions to the multiple path particle dosimetry model," J. Aerosol Sci., vol. 99, pp. 14-26, 2016. [11] P. Koullapis et al., "Regional aerosol deposition in the human airways: The SimInhale benchmark case and a critical assessment of in silico methods," Eur. J. Pharm. Sci., vol. 113, pp. 77-94, 2018. [12] E. M. A. Frederix, Eulerian modeling of aerosol dynamics. University of Twente, 2016 |
Prerequisite Skills | Statistics, Simulation, Data Visualization |
Other Skills Used in the Project | Mathematical physics, Fluids |
Programming Languages | Python |
Work Environment | Part of the Computational Science team in SEAC |
Verification of stress simulation model/software
Project Title | Verification of stress simulation model/software |
Contact Name | Artem Babayan |
Contact Email | artem.babayan@silvaco.com |
Company/Lab/Department | Silvaco Europe |
Address | Silvaco Europe Ltd, Compass Point, St Ives PE27 3FJ |
Period of the Project | 8 weeks, any time |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | |
Brief Description of the Project |
Silvaco develops Electronic Design Automation (EDA) and Technology CAD (TCAD) software. One of the modules is the tool for simulating stresses inside the electronic devices (caused by bending, heating, internal stresses etc. during device manufacture). Your task would be: |
Keywords | Stress simulation, Mathematical modelling, Model verification, Numerical analysis |
References | |
Prerequisite Skills | Mathematical physics, Numerical Analysis, PDEs |
Other Skills Used in the Project | |
Programming Languages | Python, MATLAB, C++ |
Work Environment | The student will be placed in the Silvaco building in St Ives. There are ~15 people in the office. Student is supposed to work on his/her own with advice available from the team. Also communication with our office in US may be required. |
Projects in Quantitative/Systematic investing
Project Title | Projects in Quantitative/Systematic investing |
Contact Name | Beth Duncan |
Contact Email | bduncan@bluecove.com |
Company/Lab/Department | BlueCove |
Address | 10 New Burlington Street, London W1S 3BE |
Period of the Project | 12 weeks from 14th June to the 10th September 2021, flexible |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information |
We're looking for talented individuals to join our Research team as Summer Research Analysts to help us achieve our mission. You will contribute to the development of innovative research. We don't expect you to have detailed knowledge of the asset management space, though experience in Finance or Economics is useful. What we are looking for is an intellectual curiosity that will drive you to learn as much as you can whilst you're here, together with a background in quantitative analysis and strong programming skills. To give you a little background, BlueCove is a scientific asset management firm founded in 2018. Here, we believe that scientific active investing, as an alternative and complement to both passive and traditional active investing, is set to be the next defining development for the fixed income industry. As one of our Research Analysts, you will join us for 12 weeks from 14th June to the 10th September 2021 (but we can be flexible on dates and the length of the programme). As well as learning about scientific fixed-income products and the asset management industry overall, you will spend your time working on an important research project. |
Brief Description of the Project |
The BlueCove Research team have a number of interesting projects to undertake. The research projects our Summer Research Analysts undertake are likely be in the following areas: |
Keywords | Finance Data analysis Python Scientific approach |
References | To find out more about our firm, please take a look at our website - https://www.bluecove.com/ Other references will be supplied to selected candidates |
Prerequisite Skills | Statistics, Data Visualization, Data analysis/data science; strong coding skills; disciplined/scientific approach |
Other Skills Used in the Project | Quantitative finance |
Programming Languages | Python, MATLAB, R |
Work Environment | You will work with our team of 8 Researchers, as well as your fellow Summer Research Analysts and the broader Investment Team. But our firm is new and collaborative, so you'll gain exposure to all our departments and work with many people.
We are flexible on whether you work remotely, in the office, or a hybrid of both. We compensate our interns at a market-level salary for their valuable work and you will be eligible for benefits including private medical health insurance & a virtual GP. As one of our Summer Research Analysts, you will also benefit from: |
Low-rank matrix approximations within Kernel Methods
Project Title | Low-rank matrix approximations within Kernel Methods |
Contact Name | Zdravko Zhelev |
Contact Email | application@dreams-ai.com |
Company/Lab/Department | DreamsAI |
Address | 30 Meade House, 2 Mill Park Rd, Cambridge CB1 2FG |
Period of the Project | 8 weeks, flexible |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | In machine learning, we often employ kernel methods to learn about more general relation in datasets instead of explicit projection to avoid high computational cost. |
Brief Description of the Project | Often kernel trick involves computation of matrix inversion or eigenvalue decomposition and the cost becomes cubic in the number of training data cause. Due to large storage and computational costs, this is impractical in large-scale learning problems. One of the approaches to deal with this problem is low-rank matrix approximations. The most popular examples of them are Nyström method and the random features. We would like student to test out the feasibility of these approximations on real data. |
Keywords | machine learning linear algebra mathematical statistic |
References | https://papers.nips.cc/paper/1866-using-the-nystrom-method-to-speed-up-k... https://people.eecs.berkeley.edu/~brecht/paper/07.rah.rec.nips.pdf https://stanford.edu/~jduchi/projects/SinhaDu16.pdf |
Prerequisite Skills | Statistics, Numerical Analysis, Mathematical Analysis, Geometry/Topology, Predictive Modelling |
Other Skills Used in the Project | Data Visualization |
Programming Languages | Python, C++ |
Work Environment | Project supervisor will provide 5 hours out of the 30 hours working time at the office in Cambridge. Good student will also be offered free trip to Hong Kong to take on more maths projects. |
Prize pool and odds forecast
Project Title | Prize pool and odds forecast |
Contact Name | Zdravko Zhelev |
Contact Email | application@dreams-ai.com |
Company/Lab/Department | DreamsAI |
Address | 30 Meade House, 2 Mill Park Rd, Cambridge CB1 2FG |
Period of the Project | 8 weeks, flexible |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | |
Brief Description of the Project | In a prize pool based betting game, the final returning odds of a bet is simply a function of the total amount of bet placed by everybody divided by the total amount of bets that guessed correctly. Therefore every time someone placed a bet, the odds for every bet type change for everybody. Only after the deadline for bet placing can the odds be finalized. In theory, if you know all of the prize pool's size you can determine all the odds exactly, and vice versa. The challenge here is to consider the cases when we only know a subset of the odds/prize pool's size; how much uncertainty would be introduced and can we leverage the relationships between bet types to improve our predictions. |
Keywords | combinatorics probability markov chain monte carlo |
References |
https://papers.nips.cc/paper/1866-using-the-nystrom-method-to-speed-up-k... |
Prerequisite Skills | Statistics, Probability/Markov Chains, Numerical Analysis, Simulation, Predictive Modelling |
Other Skills Used in the Project | Database Queries |
Programming Languages | Python, C++ |
Work Environment | There will be be about 30 hours of work expected at our Cambridge office, 5 of which will be supervised. Strong candidate will be offered free trips to Hong Kong to pick up potentially another project to do during an internship or part-time. |
Card Gaming AI
Project Title | Card Gaming AI |
Contact Name | Zdravko Zhelev |
Contact Email | application@dreams-ai.com |
Company/Lab/Department | DreamsAI |
Address | 30 Meade House, 2 Mill Park Rd, Cambridge CB1 2FG |
Period of the Project | 8 weeks, flexible |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | Building an AI that can compete with humans at a popular Chinese card game. |
Brief Description of the Project |
A popular Chinese card game requires 4 players; each with 13 of the 52 cards. The goal of the game is to arrange the 13 cards in 3 sets of 3, 5 and 5. Each set is then compared with the corresponding sets belonging to the other players, and the best set in each group wins. In this project, we want the student to investigate one or more of the following questions: |
Keywords | combinatorics, probability, neural network, simulations |
References | |
Prerequisite Skills | Probability/Markov Chains, Simulation |
Other Skills Used in the Project | Statistics, Data Visualization |
Programming Languages | Python, C++, Rust |
Work Environment | Project supervisor will provide 5 hours out of the 30 hours working time at the office in Cambridge. Good student will also be offered free trip to Hong Kong to take on more maths projects. |
Using mathematical techniques to assist in the continuous improvement process of a cut flower manufacturing operation
Project Title | Using mathematical techniques to assist in the continuous improvement process of a cut flower manufacturing operation |
Contact Name | David Booth |
Contact Email | david.booth@mm-flowers.com |
Company/Lab/Department | MM Flowers |
Address | Pierson Road, The Enterprise Campus, Alconbury Weald PE28 4YA |
Period of the Project | 8 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | MM Flowers established 14 years ago, is the UK's leading, integrated cut flower supplier, with a unique ownership model and innovative practices. MM Flowers is owned by the AM FRESH Group, a leading international breeder, grower and distributor of citrus and grapes; VP, East Africa's largest cut rose and vegetable producer, also incorporating and in-house breeding arm; Nature's Rights; together with The Elite Flower Ltd, the world's largest cut flower business, including breeding, propagation and growing operations throughout South America and Kenya, alongside extensive processing and distribution businesses throughout North America. MM Flowers imports, processes and distributes cut flowers to many of the major high street retail brands in the U.K. including Marks & Spencer and Tesco either to retail stores or directly to consumers homes. The UK cut flower market is extremely challenging, where consumers expect high quality flowers at very competitive prices. The vast majority of species utilised are perishable, short life products, transported from many different regions around the world every day. Pre- and post-harvest management, logistics and environmental control are all factors positively or negatively impacting ultimate flower quality and therefore consumer satisfaction. MM Flowers receives circa 500 million stems of cut flowers annually across at least 60 different species and hundreds of varieties. There is a dramatic increase in output during periods such as Christmas, Valentine's and Mother's Day where the business processes millions of stems over extended storage and packing periods. |
Brief Description of the Project |
MM has grown steadily over the last 14 years into a multi-million-pound business. As with any rapidly growing business, it is essential that with increased sales, consistency of quality and service level must be maintained and increasing scale requires progressive thinking and novel approaches to succeed. Cut flowers, whilst superficially a non-essential product, are an emotionally driven purchase and therefore the highest standards must be maintained. Historically the fresh produce industry, including the flower industry, has been slow and inefficient in using the vast amount of data that is generated to inform decisions within their businesses. MM recognised this need several years ago, and as it's growth has continued, so the need to generate and use data to inform sound decision making has intensified. The COVID-19 pandemic has intensified this need even further. During 2020 the demand for online purchasing has skyrocketed as consumers change their shopping behaviours to avoid contracting the virus. This presented numerous challenges and opportunities for MM to shape and adapt the next stage of growth. Data has been and will be at the heart of successfully adapting to the constant changes that all of us are subject to currently. Insight from data collection, enables continuous improvement, basing both strategic and tactical approaches on quantifiable data and robust analysis, which should ultimately allow MM to continue its growth trajectory successfully. An example of how MM ensures product quality is through the operational department meeting required standards for bouquet production. A dedicated quality team undertake daily inspections of the flowers from the point of receipt and throughout the manufacturing process to support the operational delivery. MM also utilises it own dedicated R&D business, APEX Horticulture, to complement the operational efforts and help provide solutions to maintain or enhance flower quality. Whilst APEX and the intake quality function have long term, established data sets, dedicated data collection during bouquet manufacturing is a relatively new venture, requiring insight and improvement from a talented student with fresh ideas and an aptitude for data analysis. As such, there is the possibility to develop processes to allow for future data to be incorporated and analysed more efficiently across the Technical function (and indeed the wider business), allowing for quicker and more accurate decision making. Whilst this placement will focus on production, the scope of the project is wide, with opportunities for the successful student to improve processes for data collection and analysis across both quality control and quality assurance. During this project, the prospective student will have access to extensive data, including quality assessments on receipt of the flowers, operational quality assurance, and retailer consumer metrics, for example. These datasets present an opportunity to undertake more detailed analysis of long-term trends, and how various factors influence the end consumer quality and performance. Previous CMP placements within the business have highlighted several trends and added real value to the business and changed our working practices; this is therefore an opportunity to make a real difference and see your work applied in industry. The successful student will spend their placement within the technical department, which focuses on quality and customer relations, but will experience all aspects of a fast-paced, fresh produce business. This placement will also include liaising with different departments, project management, communication skills, and working towards the needs of the business. Skills Required: Strong computer skills, Experience with statistics and modelling, Clear communicator, Self-motivated, Demonstrates initiative, Project management, Problem solving, Industry focussed. |
Keywords | Fresh Produce, Cut flowers, Technical, Quality, Data analysis, Statistics |
References | |
Prerequisite Skills | Statistics, Numerical Analysis, Mathematical Analysis, Data Visualization, Knowledge of fresh produce will be considered an advantage |
Other Skills Used in the Project | Statistics, Numerical Analysis, Mathematical Analysis, Data Visualization |
Programming Languages | No Preference |
Work Environment | The student will be working as part of the Technical department, and will be supported by myself and another Postdoc. |
Quantum computing internship
Project Title | Quantum computing internship |
Contact Name | Ophelia Crawford |
Contact Email | ophelia.crawford@riverlane.com |
Company/Lab/Department | Riverlane |
Address | First Floor, St Andrew's House, 59 St Andrew's Street, Cambridge CB2 3BZ |
Period of the Project | 10-12 weeks between late June and late September |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 19th February 2021 |
Background Information | Riverlane builds ground-breaking software to unleash the power of quantum computers. Backed by leading venture-capital funds and the University of Cambridge, we develop software that transforms quantum computers from experimental technology into commercial products. |
Brief Description of the Project |
What you will do: Requirements: Please visit our website (https://www.riverlane.com/vacancy/quantum-computing-summer-internship-sc...) for more information and to apply. |
Keywords | quantum, algorithms, software |
References | |
Prerequisite Skills | |
Other Skills Used in the Project | |
Programming Languages | |
Work Environment |
Our full-time summer internships are designed to enable current students in a quantitative field to translate their skills and expertise into an industrial setting. You will join us at our office in Cambridge, UK, for 10 to 12 weeks, where you will have the opportunity to work alongside our team of software developers, mathematicians, quantum information theorists, computational chemists and physicists - all experts in their fields. Every intern will have a dedicated supervisor and will work on a project designed to make the best use of their background and skills whilst developing their knowledge of quantum computing. We will support all interns to try and produce a concrete output by the end of the internship e.g. a paper, product, or software tool. We will consider remote internships depending on the Covid-19 situation. |
Pattern finding in industrial data
Project Title | Pattern finding in industrial data |
Contact Name | Geoff Walker |
Contact Email | geoff.walker@faradaypredictive.com |
Company/Lab/Department | Faraday Predictive Ltd |
Address | St John's Innovation Centre, Cowley Road, Cambridge CB4 0WS |
Period of the Project | 8 weeks between late June and September |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information |
Faraday Predictive is a Cambridge-based business, founded in 2017, with a world-leading technology for remote monitoring and diagnostics of industrial machinery that not only improves customers' business performance, it also contributes to reducing climate change impacts. This technology is based on a whole suite of mathematical techniques, many of which have been developed in close collaboration with the Maths Department, directly involving students undertaking CMP summer projects. The students' contribution has had a significant impact on the success of the business. We continue to develop further improvements in the technology and its capability - and hope that this year's student(s) will see their projects have a similarly significant impact and benefit. |
Brief Description of the Project |
This project is all about identification of patterns in machine behaviour, to allow for identification of deviations from normal. The patterns of interest are in the form of spectral shapes, which our system creates for each machine every time it takes a reading - which can be several times per minute. So we have many many spectra as a basis from which to work - sometimes hundreds from one machine, which may or may not vary through time, and different spectra for each different machine that is monitored (and there are many machines). Each machine has a "natural" spectral shape when it is in good condition. If a fault starts to develop in the machine, the spectral shape changes in some way, and we use this change through time to trigger warnings, and to diagnose the nature of the fault, allowing us to provide specific advice on recommended corrective action, and how soon it should be executed. But if the first time we ever take a reading on a machine, there is already a fault present, we want to be able to identify this as a pre-existing fault, and not simply accept it as normal for this machine. At present we do this by comparing the shape for this particular machine against a spectrum for a "typical" machine - which is actually a simple averaged-out spectrum from a wide range of machines of different types. Because the natural spectrum shape of each different type of machine is different, this "typical" spectrum is not a very good basis against which to decide whether the new machine that we are seeing for the first ever time has any faults or not. Instead, we would like to create more specific "normal" spectra, for each type of machine, or group of types of machine, allowing us to select the one most appropriate to the machine in question. So the tasks that we envisage being involved in this project are: A successful outcome of this project will be that we end up with identification of patterns and sensible comparisons allowing us to provide more precise diagnostics, more precise indication of "normal" vs "abnormal", particularly for the first time we ever take a reading on a new machine, but also for application during subsequent changes through time. |
Keywords | Pattern recognition. Shape description. Data Grouping. Algorithm. Comparison. |
References | |
Prerequisite Skills | Statistics, Mathematical physics, Mathematical Analysis, Geometry/Topology, Database Queries, App Building, Coding, eg Python (maybe). SQL queries (maybe - and we can teach) |
Other Skills Used in the Project | Predictive Modelling, Database Queries, App Building |
Programming Languages | Python, MATLAB, R, C++ |
Work Environment | Remote working assumed. Project is basically one person, rather than a team, with a supervisor working as closely as is required. We expect frequent Zoom meetings to review, guide, assist. Data provision from our database - we'll arrange remote access. Normal office working hours assumed, but when working from home this is of course flexible. |
Analytical solutions for use of varistors in superconducting magnet quench protection
Project Title | Analytical solutions for use of varistors in superconducting magnet quench protection |
Contact Name | Dr Andrew Varney, Consultant Magnet Engineer |
Contact Email | andrew.varney@oxinst.com |
Company/Lab/Department | Oxford Instruments NanoScience |
Address | Tubney Woods, Abingdon, Oxon OX13 5QX |
Period of the Project | Up to 8 weeks between June and September |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information |
Typical high field superconducting magnets can have a magnetic stored energy of around 10 MJ or more, which is 5 times the kinetic energy of a 4 tonne HGV travelling at 70mph. In the superconducting circuit, as there is no resistance, the current flows without energy loss. However, an extremely small disturbance, say 10 μJ, can lead to the superconducting magnet windings becoming resistive locally. This leads to a chain reaction, with the whole magnet becoming resistive and the stored energy dissipating as heat in the magnet windings over a few seconds' time scale. This is known as a magnet quench. There are various schemes to protect a superconducting magnet against the effects of the rapid stored energy dissipation in a quench. The main goal is to prevent too much of the energy from being dumped locally which can produce a hotspot within the magnet windings. Typically, a protection circuit will consist of a potential divider network with resistors and diodes to manage currents and voltages within individual parts of the magnet. It will often also include secondary heaters to spread the quench across other parts of the magnet faster than the passive quench propagation would proceed. Oxford Instruments has recently proposed a novel quench protection scheme involving varistors which is the subject of a patent application (not yet published). A varistor is an electrical device which exhibits a non-linear voltage vs current relationship. Specifically, at low voltage a varistor has a relatively high electrical resistance which decreases with increase voltage. Modelling of the quench behaviour and some experimental work has shown that the use of such components could be useful in a particular configuration to improve the quench protection for high field magnets. Analytical equations can be derived based on the underlying physics using reasonable approximations. However, even for the simplest case of a homogenous coil divided into two sections and protected using conventional linear resistors, the result describing the propagation of a quench through the magnet coils is a second-order non-linear ODE for which only approximate solutions can be found with some further assumptions. It is not clear how to find solutions of the ODE representing the generalised case with variable resistance in the protection circuits. Although numerical simulations for this system could be developed, analytic descriptions of how varistors would respond in a quench protection circuit will be invaluable in providing insight into the behaviour of the system over a wide range of parameters. This will also enable additional functionality in existing in-house software without requiring a great deal of computational resource. |
Brief Description of the Project |
The primary goal of the project is to find approximate analytical solutions to the equation representing the magnet quench propagation in the simplest case of the protection circuit subdividing the magnet into two sections, but generalised to allow for the use of varistors. The varistor behaviour may be represented as a simplified equation, but it may be possible to extend the treatment to allow for a more accurate representation. The use of numerical solutions to guide and test in the search for approximate analytical solutions would be appropriate, and such solutions would still be useful should analytical solutions prove to be too elusive. This project would advance Oxford Instruments' understanding and modelling of varistors for use in protecting superconducting magnets. It is intended that it would thus support development of their practical use in the manner described in our patent application by helping in the selection of materials parameters required for a real magnet. The implementation at Oxford Instruments is likely to be in two ways: via an analytical tool to make initial estimates and by using the equations in our in-house quench modelling code. If the project were particularly successful, an extension goal would be for the student to start working on these tools. An academic outcome would be at least one published paper, possibly in a mathematical physics journal, but more likely a magnet/physics one. |
Keywords | Varistor Superconducting magnet Protection circuit Quench Analytical solution |
References | https://www.oxinst.com/news/a-new-era-in-high-field-superconducting-magn... Martin Wilson, Superconducting Magnets (OUP, 1983), especially chapter 9 |
Prerequisite Skills | Mathematical physics, Numerical Analysis, Mathematical Analysis |
Other Skills Used in the Project | Simulation, Predictive Modelling |
Programming Languages | No Preference, FORTRAN would be ideal |
Work Environment | The student would be part of the R&D / technology development team, which consists mostly of doctoral-qualified physicists, for the duration of the project. An experienced mathematical physicist working in another group will also be available for consultation. The status of remote working depends on progress of the current pandemic, but there is likely to be at least an element of this. Ideally, the student would be able to work in the office/factory part of the time in order to meet people and to see the products to which the work relates. Oxford Instruments normal working hours are 37 hours per week (including early Friday finish). |
Modelling and Numerical Simulation of Stress Dependent Oxidation of Silicon
Project Title | Modelling and Numerical Simulation of Stress Dependent Oxidation of Silicon |
Contact Name | Vasily Suvorov |
Contact Email | vasily.suvorov@silvaco.com |
Company/Lab/Department | Silvaco Europe, Technology Computer-Aided Design (TCAD) Department |
Address | Compass Point, St Ives, Cambridgeshire, PE27 5JL |
Period of the Project | 8 weeks between July and September |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | The fabrication of integrated circuit microelectronic structures and devices vitally depends on the thermal oxidation process of silicon. The project aims to analyse the mathematical models of this process and construct effective numerical algorithms to explore the effects of various modelling assumptions. The successful outcome of the project will become a part of the company's commercial software. |
Brief Description of the Project |
Thermal oxidation of silicon is a way to produce a thin layer of oxide on the surface of a wafer in the fabrication of microelectronic structures and devices. The technique forces oxygen to diffuse into the silicon wafer at high temperature and react with it to form a layer of silicon dioxide: Si + O2 -> SiO2. The oxide layers are used for the formation of gate dielectrics and device isolation regions. With decreasing device dimensions, precise control of oxide thickness becomes increasingly important. In 1965 Bruce Deal and Andrew Grove proposed an analytical model that satisfactorily describes the growth of an oxide layer on the plane surface of a silicon wafer [1]. Despite the successes of the model, it does not explain the retarded oxidation rate of non-planar, curved silicon surfaces. The real cause for the observed retardation behaviour is believed to be the effect of viscous stress on the oxidation rate [2-3]. In this project, we aim to explore the existing mathematical models of the stress-dependent oxidation and propose a numerical scheme to obtain the solution. The approach that we will use is a combination of analytical and numerical analyses of a system of non-linear ordinary differential equations. The student is expected to implement the numerical algorithms in C++ language, although no previous experience in C++ coding is required. Silvaco's own software products may also be used as a tool in this project if required. |
Keywords | Oxidation, TCAD, Mathematical modelling, Numerical Algorithms, C++ coding |
References | [1] B.E.Deal, A.S.Grove (1965), General relationship for the thermal oxidation of silicon, Journal of Applied Physics, Vol.36, N12, 3770-3778. [2] D.B.Kao, J.P.McVittie, W.D.Nix, K.C.Saraswat (1988), Two-dimensional thermal oxidation of Silicon - I. Experiment, IEEE Transactions on Electron Devices, Vol. ED-34, N 5, 1008-1017. [3] D.B.Kao, J.P.McVittie, W.D.Nix, K.C.Saraswat (1988), Two-dimensional thermal oxidation of Silicon - II. Modeling stress Effects in Wet Oxides, IEEE Transactions on Electron Devices, Vol. ED-35, N 1, 1008-1017. |
Prerequisite Skills | Mathematical physics, Numerical Analysis, Mathematical Analysis, Simulation |
Other Skills Used in the Project | |
Programming Languages | C++, None Required, Interest in C++ coding |
Work Environment | The student will work on his/her own with the support and guidance from the supervisor. |
Deep representation learning for health records: identifying patients with similar interactions with health services
Project Title | Deep representation learning for health records: identifying patients with similar interactions with health services |
Contact Name | Steve Kiddle |
Contact Email | steven.kiddle@astrazeneca.com |
Company/Lab/Department | AstraZeneca, Biopharmaceuticals R&D, Data Science and AI |
Address | Academy House, 136 Hills Road, Cambridge CB2 8PA |
Period of the Project | 8 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | Because people have been living healthier and longer lives, they are often living with more than one health condition, referred to in the scientific research setting as living with "multimorbidity." However, current NHS guidelines provided to doctors and nurses are organised around patients having only a single condition, ignoring the fact that many, especially the elderly, live with multimorbidity. It's important to better understand how to identify and group patients with multimorbidity in a meaningful way, so that doctors and nurses could provide the best possible personalised care. |
Brief Description of the Project |
The aim of the study is to use "deep learning" (a form of artificial intelligence) to determine whether patients that fall within a particular "multimorbidity" subgroup are in greater need of healthcare services in future (e.g., more frequent doctor visits, prescriptions, hospitalisations, etc). The MSc student will contribute to the creation of a "proof of concept" for the above study question that will be used to help inform future decision making and planning of next steps on the project. The student would have an opportunity to: The student would split their time between the above project and working on other "live" projects running within the Data Science and AI team, providing students an opportunity to work on a wide variety of tasks that a data scientist typically face during a normal working day. |
Keywords | Multimorbidity, deep learning, neural networks, artificial intelligence, healthcare data, health data science |
References | Landi, I., Glicksberg, B. S., Lee, H. C., Cherng, S., Landi, G., Danieletto, M., Dudley, J. T., Furlanello, C., & Miotto, R. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 3, 96 (2020). |
Prerequisite Skills | Statistics, Mathematical Analysis, Algebra/Number Theory |
Other Skills Used in the Project | Image processing, Predictive Modelling, Database Queries |
Programming Languages | Python, R |
Work Environment | Virtual or face-to-face, depending on the Covid situation |
Analytical Solution for Multi-Barrier Release, Mechanically Link Diffusion to In-vitro Release
Project Title | Analytical Solution for Multi-Barrier Release, Mechanically Link Diffusion to In-vitro Release. |
Contact Name | Weimin Li |
Contact Email | weimin.li1@astrazeneca.com |
Company/Lab/Department | AstraZeneca |
Address | The Pavilion, Granta Park, Great Abington, Cambridge CB21 6GP |
Period of the Project | 8 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | Extended release of drug molecules from its carriers is one of the approches to improve patient compatability by for example reducing dose frequency. This project focus on one field that diffusion is believed to be the main mechanism for drug releasing. The increased complexity of formulation that multi-layers are designed, then it requires high levels of math ability to put together the differentials from Fick's law of diffusion. |
Brief Description of the Project | Week 1-2: Introduction of the background and read papers. Practice on writing simple and executable scripts. Week 3-4: Work on the analytical solutions from Elliot J. Carr and Giuseppe Pontrelli that solves release from multi-layer spheres. Week 5 -8: Bring empty spheres in to the calculation, and fit with existing data to estimate the diffusion coefficient and impact of the amount of empty sphere. |
Keywords | Fick's law of diffusion. Differential equations. Analytical and numerical solutions. |
References | |
Prerequisite Skills | Mathematical physics, Numerical Analysis, Mathematical Analysis, Algebra/Number Theory |
Other Skills Used in the Project | Simulation, Predictive Modelling |
Programming Languages | Python, MATLAB, C++ |
Work Environment | Mostly work from home |
Multi-scale modeling to enable data-driven biomarker and target discovery
Project Title | Multi-scale modeling to enable data-driven biomarker and target discovery |
Contact Name | Dr Shameer Khader |
Contact Email | shameer.khader@astrazeneca.com |
Company/Lab/Department | AstraZeneca, Data Science and Artificial Intelligence |
Address | Academy House, 136 Hills Road, Cambridge CB2 8PA |
Period of the Project | 8 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | Metagenomic sequencing of clinical samples has improved our understanding of how dysbiosis of microbial flora influences various human diseases. Emerging studies have shown that several microbial signatures were explicitly altered in the setting of immunological, cardiovascular, or gastrointestinal disorders, etc. Microbiome signatures, identified in the context of a disease and integrated with other types of molecular profiling data (genome, microbiome, transcriptome, metabolome, etc., collectively called multi-omics data), are gaining relevance in drug discovery. Such a data set offers opportunities to understand the specific functional pathways and metabolic reactions mediated by host-pathogen interactions in various diseases. Multi-omics is an emerging theme in drug discovery. It provides an unprecedented view into molecular players driving conditions and enables a path to discover new targets and therapies shortly. |
Brief Description of the Project |
AstraZeneca is investing in this exciting and vital area of drug development to generate unique multi-omics data sets to accelerate the development of novel therapies. Several projects are currently in progress to integrate microbiome with heterogeneous data sets (imaging, multi-omics, clinical, in-vivo disease models, etc.) using quantitative approaches. Collectively, such a system could lead to new targets and unique signatures correlated with human diseases. The collaborative study of altered microbial taxa/species and corresponding clinical phenotype by compiling a large and diverse data set will be an essential step toward understanding microbes' role in disease comorbidities. To achieve this goal, we collaborate with Microbial Sciences across a portfolio of projects that span multiple disease modalities. The student will develop multi-scale models capable of integrating multi-omics data with clinical and imaging data using modern machine intelligence methods. The incoming candidate will be part of the Special Projects and Research Team. The team is currently working on a portfolio of projects with a common goal of accelerating drug or target discovery using machine intelligence methods. We aim to cross-train the incoming student in drug discovery, precision medicine, multi-scale biology, and data science. We expect the student to leverage high-performance computing and biomedical informatics facilities in AZ to develop data-driven methods to analyze large multi-scale, multi-omics data sets. The student will be part of collaborative efforts across microbial science, artificial intelligence, and drug development. This unique collaborative nature of the project will improve hands-on skills in clinical data, biomedical data analytics, and data science. The incoming student will contribute to the design, development, and deployment of predictive models that help organize, analyze and interpret. The student can also gain experience by working closely with the Microbial Sciences clinical development team. |
Keywords | Drug Discovery, Data Science, Machine Learning, Bioinformatics, Precision Medicine |
References | https://pubmed.ncbi.nlm.nih.gov/28892060/ https://pubmed.ncbi.nlm.nih.gov/31126891/ |
Prerequisite Skills | Statistics, Probability/Markov Chains, Image processing, Predictive Modelling, Database Queries |
Other Skills Used in the Project | Probability/Markov Chains, Predictive Modelling, Data Visualization, App Building |
Programming Languages | Python, R, No Preference |
Work Environment | 9-5 at AZ campus or remote (depends on COVID restrictions) |
TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials
Project Title | TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials |
Contact Name | Dr Shameer Khader |
Contact Email | shameer.khader@astrazeneca.com |
Company/Lab/Department | AstraZeneca, Data Science & Artificial Intelligence, Special Projects & Research |
Address | Academy House, 136 Hills Road, Cambridge CB2 8PA |
Period of the Project | 8 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | One of the major impediments to successful drug development is the complexity, cost and scale of clinical trials, particularly large Phase III trials. Despite a wealth of historical data, clinical trial sponsors typically have a difficult time fully leveraging historical trial data to drive insight into optimal clinical trial design, reducing trial cost and scale. Many barriers exist to leveraging this data including drift in clinical terms and procedure over time, differences in trial structure and differences in data sampled. Recent advances in machine learning in areas such as Natural Language Processing (NLP) and graph modeling of complex data have enabled rapid advances in a number of domains. The TrialGraph project seeks to apply these methodologies to clinical trial data, creating a unified graph model to represent clinical trials across phases and therapeutic areas. Such a data modeling approach would enable novel and power analytics that enable efficiencies in drug development and benefit to our patients. |
Brief Description of the Project |
Multiple graph modeling initiatives are running in parallel and this project will leverage their infrastructure, graph modeling of external clinical and biomedical data as well as expertise. In collaboration with this wider community, the TrialGraph project will seek to leverage these resources while developing novel graph representations of historical AZ trials, methodologies to analyze these graph representations that provide meaningful insight and experiment with other machine learning methodologies that could yield both novel discoveries and operational efficiencies. Expected Outcomes: |
Keywords | Graph modeling, Data integration, Data Science, Clinical Trials, Machine Learning |
References | |
Prerequisite Skills | Statistics, Probability/Markov Chains, Geometry/Topology, Predictive Modelling |
Other Skills Used in the Project | Database Queries, Data Visualization, App Building |
Programming Languages | Python, R |
Work Environment | AstraZeneca Campus/Remote (depending on COVID situation) |
Network reconstruction from single cell transcriptomic data
Project Title | Network reconstruction from single cell transcriptomic data |
Contact Name | Nil Turan |
Contact Email | nil.c.turan-jurdzinski@gsk.com |
Company/Lab/Department | GSK, Human Genetics Computational Biology |
Address | Gunnels Wood Road, Stevenage, SG1 2NY, United Kingdom |
Period of the Project | 8-10 weeks, flexible |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | Currently, molecular interaction networks in the field, and those used to support numerous target identification and validation efforts within GSK, present a generalized network comprising interactions that may not exist within a specific cell-type. The ability to reconstruct and analyse cell-specific molecular interaction networks has the potential to improve our cell-specific understanding of molecular processes and directly inform on relevant assays or mechanisms driving a disease. Recent advances in single cell RNA-seq technology allows the transcriptome of individual cells to be assessed [1]. This brings a great opportunity to reconstruct cell-specific molecular interaction networks. Several methods have been implemented to build such networks [2-3] but a systematic evaluation of such methods is yet to be conducted. |
Brief Description of the Project | The student will explore available methods to reconstruct networks from single cell RNA-seq data [2-3]. A background in statistics and mathematics is critical for reviewing these methods. They will then evaluate and test the performances of these different methods. Knowledge of single cell data is not required although some preliminary understanding will be useful. Preferred programming language would be R. |
Keywords | Network inference, single cell transcriptomics, computational biology, statistics, R |
References | [1] Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol (2019)15:e8746https://doi.org/10.15252/msb.20188746 [2] Simon Cabello-Aguilar, Mélissa Alame, Fabien Kon-Sun-Tack, Caroline Fau, Matthieu Lacroix, Jacques Colinge, SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics, Nucleic Acids Research, Volume 48, Issue 10, 04 June 2020, Page e55, https://doi.org/10.1093/nar/gkaa183 [3] Efremova, M., Vento-Tormo, M., Teichmann, S.A. et al. CellPhoneDB: inferring cell"“cell communication from combined expression of multi-subunit ligand"“receptor complexes. Nat Protoc 15, 1484"“1506 (2020). https://doi.org/10.1038/s41596-020-0292-x |
Prerequisite Skills | Statistics |
Other Skills Used in the Project | |
Programming Languages | Python, R |
Work Environment | The student will work closely with Human Genetics Computational Biology, Functional Genomics Computational Biology and the stats group. The student will have the opportunity to interact and discuss with experts in single cell seq technology and also network approaches. |
Algorithm development and modelling for security applications
Project Title | Algorithm development and modelling for security applications. |
Contact Name | Sam Pollock |
Contact Email | careers@iconal.com |
Company/Lab/Department | Iconal Technology |
Address | St John's Innovation Centre, Cowley Road, Cambridge CB4 0WS |
Period of the Project | At least 8 weeks, June start |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | We will start interviewing as soon as we receive applications, but applications up to the end of March will be considered if we haven't already filled the position. |
Background Information | We are a Cambridge based consultancy carrying out research and development in new and emerging technologies for security, offering independent, impartial, science-based advice. This will be our fourth year offering CMP placements, and we are looking for keen, innovative, self-motivated individuals who are interested in the practical application of maths to solve real-world problems. You will be working in a small friendly (we like to think) team of scientists and engineers, and contributing directly to the output of current projects. |
Brief Description of the Project | Right now we do not know exactly what the student project will entail as we work in very rapidly evolving field. This years projects are likely to be focused around one or more of developing algorithms and machine learning solutions to analyse complex sensor data, building event-based simulations of security processes (including data collection and analysis from field observations) or helping with tests and trials of technology. Previous students have been exposed to all stages of the data pipeline / data science process. Our work is highly varied and interesting and you will likely get stuck in with all aspects of the job! |
Keywords | Security, machine learning, algorithms, |
References | http://www.iconal.com |
Prerequisite Skills | Statistics, Numerical Analysis, Image processing, Simulation, Data Visualization |
Other Skills Used in the Project | Predictive Modelling, App Building |
Programming Languages | Python, R, C++, Python preferred (as its our main one), but can consider other languages if relevant |
Work Environment | We are a small friendly team of 8 people, all working on a range of interesting diverse projects. The student will be based in our main office (or lab for data gathering) working on one or more projects with us, with a mentor on each project to help with queries, reviewing work and assigning tasks. This is of course subject to change should we still be under lockdown! We had a remote summer student in 2020, who worked virtually with the team. |
Developing an approach for biotherapeutic purity quantitation from analytical instrument signals
Project Title | Developing an approach for biotherapeutic purity quantitation from analytical instrument signals. |
Contact Name | David Hilton |
Contact Email | david.w.hilton@gsk.com |
Company/Lab/Department | GSK, Biopharm Process Research Group |
Address | Gunnels Wood Road, Stevenage, SG1 2NY |
Period of the Project | 8 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | The Biopharm Process Research group is the first step on the route from newly discovered biotherapeutic drugs to a commercial product which can be administered to a patient. It is the group's responsibility to screen candidate molecules for their developability, process fit and identify a suitable commercial cell line for their production. A key requirement of the group during process development and candidate molecule screening is to characterize the chemical, physical or biological attributes of the molecule to assess its purity. This is a critical attribute, as the purity of a biopharmaceutical product will influence both the efficacy and also safety of the drug. |
Brief Description of the Project | The analytical techniques used to characterize the purity of a biopharmaceutical drug, often output a signal that is a composite of peaks associated with the product of interest and product related purities along with signal noise, baseline deviations and instrument associated drift. A a part of GSK's standard biopharm drug development activities, thousands of these instrument signals are generated within the department each month, and the automated peak identification methods that are currently employed cannot adequately and consistently quantify drug purity. This oftentimes necessitates high levels of time-consuming manual data processing. The aim of this project is to develop an optimal procedure for peak identification and purity determination, using techniques ranging from simple deconvolution to CNN and LSTM machine learning methods, with model performance benchmarked against our large departmental datasets. Should a successful strategy be developed, this could be incorporated into a tool for deployment to our data processing pipelines, thereby enabling more rapid and robust development of GSK's biopharm drug portfolio. |
Keywords | Modelling, Visualization, Signals, Scripting, Pharmaceuticals |
References | |
Prerequisite Skills | Statistics, Predictive Modelling, Data Visualization |
Other Skills Used in the Project | Database Queries |
Programming Languages | Python, R |
Work Environment | The student will be supervised during the project and, though working individually, will be involved in all departmental activities. Support from the Statistical Sciences group and Data Science teams will be available should this be required. Standard office hours will apply and remote working opportunities are available. |
Is Quantum Machine Learning mature for clinical applications?
Project Title | Is Quantum Machine Learning mature for clinical applications? |
Contact Name | Domingo Salazar |
Contact Email | domingo.salazar@astrazeneca.com |
Company/Lab/Department | AstraZeneca |
Address | City House, 130 Hills Road, Cambridge CB2 1RE |
Period of the Project | 8 weeks between late June and September |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | Quantum Computing (QC), in general, and Quantum Machine Learning (QML), in particular, have made considerable progress in the last few years. It is now possible to formulate typical clinical data science projects like uncovering associations of adverse effects with medicines or subgroup identification as QML problems. But what would be the benefits of doing this at this moment in time? Is QML ready to start to be used regularly in Pharma? And if so, for which kind of projects may it provide advantages over classical computing? |
Brief Description of the Project |
We would like to formulate an open-ended project made up of two parts: The literature review should provide a feeling for the state of the art in this area. In particular, it should point us towards what are the most promising current applications of QML to Pharma. The practical example should be chosen based on the results of the literature review. It will be dimensioned according to the available time and QC resources available. Data sources may include publicly available clinical datasets, text, images and/or genomic sequences depending of the selected application. There are a number of QC providers in the market place at the moment but for this purpose as well as for the literature review, it would be very interesting if we could set up a 3-way collaboration between the Cambridge Math Department, The QC group in Cambridge and AstraZeneca. This relationship could then be continue beyond this student project. |
Keywords | Quantum Computing, Quantum Machine Learning, Pharma, Clinical, AI |
References | * Quantum Machine Learning, Peter Wittek, Elsevier Insights (book) * Amazon Braket (https://aws.amazon.com/blogs/aws/amazon-braket-get-started-with-quantum-...) * Introduction to Quantum Computing with Python (https://pythonspot.com/an-introduction-to-building-quantum-computing-app...) |
Prerequisite Skills | Statistics, Simulation, Machine Learning |
Other Skills Used in the Project | Image processing, Predictive Modelling, Data Visualization |
Programming Languages | Python, R, Some of QC languages like Q#, if the corresponding Python packages proves to be too limited for our purposes. |
Work Environment | We like to integrate our students within our team so they experience what it means to do Data Science in a Pharma company. So the student will be able to talk to a number of data science specialists in our team as well as clinicians, biologist, bio-informaticians, image analysts, etc. as appropriate. |
Aggregating embeddings in deep unsupervised graph learning
Project Title | Aggregating embeddings in deep unsupervised graph learning |
Contact Name | Khan Baykaner |
Contact Email | khan.baykaner@astrazeneca.com |
Company/Lab/Department | Astrazeneca, Deep Learning, AI Engineering, R&D IT |
Address | Cambridge Road, Melbourn, Royston SG8 6EH |
Period of the Project | 8-12 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | The application of AI to digital pathology for drug development is a burgeoning field which promises to radically replace and enhance to the existing analysis workflows that lead to biological insight. One area of interest is the analysis of multiplex immunofluorescence (mIF) imaging for oncology; by using multiplexed tissue staining one can acquire a rich set of data for investigating the tumour microenvironment. However, efficient methods for analysing this rich data are still in their infancy. One method of investigation is to build a graph mapped to the cells within the tissue, and then use unsupervised learning techniques on the graph to capture the structure of the information in embeddings. |
Brief Description of the Project | This project will explore how elements of the unsupervised learning technique (e.g. such as the corruption function in deep graph infomax) affect the downstream performance of the trained embeddings, as well as techniques for aggregating embeddings in a spatially-aware manner. Depending on the area of focus, success would involve alterations to the mIF graph pipeline that allow embeddings to be combined across multiple samples in a consistent, spatially-aware manner without loss of relevant information. This in turn would be expected to dramatically improve the predictive power of downstream patient survival models. |
Keywords | graphs, unsupervised learning, deep learning, AI, pathology |
References | https://arxiv.org/pdf/1809.10341.pdf |
Prerequisite Skills | python, deep learning |
Other Skills Used in the Project | Data Visualization |
Programming Languages | Python |
Work Environment | Will collaborate with a small team of machine learning engineers. Whether work will be remote depends on the situation regarding the pandemic. |
Predicting the pick-up weight of chocolate from real-time factory data
Project Title | Predicting the pick-up weight of chocolate from real-time factory data |
Contact Name | Joe Donaldson |
Contact Email | Joe.Donaldson@unilever.com |
Company/Lab/Department | Unilever R&D |
Address | Colworth Science Park, Sharnbrook, Bedford MK44 1LQ |
Period of the Project | Flexible, minimum 8 weeks |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information | Chocolate is an expensive ingredient that Unilever uses extensively within its ice cream portfolio in some of its most well-known brands like Magnum. To maintain the business viability of products, reduce waste, and maintain product quality and uniformity, the chocolate dosage, the so-called pick-up weight, needs to be well controlled. This parameter is a function not only of the properties of chocolate variant and batch itself, but also the conditions under which it is processed in the factory. Therefore, an accurate adjustment of these parameter during product assembly is key, and the ability to predict and proactively manage possible deviations would offer significant quality improvements and savings. |
Brief Description of the Project | This project will explore the feasibility of using sensor data to predict chocolate pick-up weight. The aim is to build upon our existing insights and harness the availability of this new data stream to construct a predictive hybrid model linking the data and the science of chocolate behaviour. Our end goal is a real time model suggesting simple adjustments to the operating parameters of the process line so factory operators can ensure the best possible chocolate-coated ice cream products make it into the hands of the consumer at a competitive price. |
Keywords | Ice Cream, Chocolate, Modelling, Machine-Learning, Python |
References | |
Prerequisite Skills | Statistics, Predictive Modelling, Data Visualization |
Other Skills Used in the Project | Statistics, Predictive Modelling, Data Visualization |
Programming Languages | Python, MATLAB, R |
Work Environment | Independent working but with regular support from the wider science and technology team. The student will work remotely and be expected to share progress/results with supervisor(s) in daily/bi-weekly calls. |
Early Stage Investing: Model Development for The Identification of Investable Technologies and Industries
Project Title | Early Stage Investing: Model Development for The Identification of Investable Technologies and Industries |
Contact Name | Oliver Hedaux and Professor Richard Samworth |
Contact Email | oliver@ahren.co.uk and rjs57@hermes.cam.ac.uk |
Company/Lab/Department | Statslab, DPMMS and Ahren Innovation Capital |
Address | Statistical Laboratory, Centre for Mathematical Sciences, Wilberforce Road, Cambridge, CB3 0WB |
Period of the Project | 8-10 weeks, as agreed |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Friday 26th February 2021 |
Background Information |
Ahren Innovation Capital is an investment fund with a remit to invest in and help build transformational companies at the intersection of deep technology and deep science that will have a positive impact on the world. Ahren's broad fields of investment activity include: Brain & Artificial Intelligence; Genetics & Platform Technologies; Space & Robotics; And Planet & Efficient Energy. Whatever the domain, Ahren believes in taking asymmetric, considered risk that will deliver superior rewards -- capturing a generational opportunity to provide smart capital to deep technology. Unlike in public markets where metrics of company health are established and quantified, the privately owned start-up companies that Ahren invests in have historically required a manual and qualitative approach to assessing company health and potential to create outsized returns. This is largely a data problem. In public markets, where companies are legally required to publish their financial and operational results, the volume of data available for automated analysis is plentiful, consistent, and constrained to relatively few sources of truth in a structured format. On the other hand, private company data is rarely publicly disclosed, and the small sample of data that is shared is typically unstructured or semi structured and spread unevenly across many resources and data types (numerical and text). In some of the most exciting cases, where companies operate under the radar in "stealth mode", there is very little information at all. As Ahren, we seek advantage in overcoming the historical constraints to quantitative early stage investing by designing novel, complementary systems to enhance our deep domain expertise in the areas that matter most. |
Brief Description of the Project |
Key drivers of Ahren's success is its ability to rapidly identify and assess world leading commercial technologies and gaps within industries that are ripe to have their biggest challenges addressed by innovation. Therefore, Ahren is starting with the automation of those tasks. Project Goals: For each model, Ahren has set out a non-exhaustive list of high-level questions to be assessed using the cross-domain expertise of Cambridge's Statistical Laboratory and Ahren Innovation Capital: Industry Technology This project will require originality and creativity, bringing to bear the potential of mathematics, statistics, and machine learning to collate and derive insight from semi-structured and unstructured data. It is essential that a quality data set is built. The data set should be kept up to date and relevant using application programming interfaces and web-scraping techniques. There are many sources of data and a good project will use a range of sources. A non-exhaustive list of possible sources is below: This interdisciplinary project would ideally be completed by one Mathematician and one Computer Scientist. |
Keywords | Statistics, ML, Unstructured, Semi-Structured, Investing |
References | CB Insights Mosaic Score: https://www.cbinsights.com/company-mosaic |
Prerequisite Skills | Statistics, Predictive Modelling, Database Queries, Machine Learning |
Other Skills Used in the Project | Data Visualization, App Building |
Programming Languages | No Preference |
Work Environment | Remote placement. Students, in this case two, will have regular (twice weekly) check-ins with qualified members of the Ahren Innovation Capital Investment Team. Meetings with Senior team members will be held biweekly. There will be opportunity to schedule additional meetings as the project demands. Students will work normal office hours, five days per week. |
Modelling optionality in inflation linked securities
Project Title | Modelling optionality in inflation linked securities |
Contact Name | Richard Manthorpe |
Contact Email | cambridge.recruitment@symmetryinvestments.com |
Company/Lab/Department | Symmetry Investments, Quantitative Analytics |
Address | 86 Jermyn Street, Fourth Floor, London SW1Y 6JD |
Period of the Project | 8-12 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | |
Background Information | We are looking for an intern to work in the Quantitative Analytics group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations. This project focuses on modelling the embedded optionality in certain inflation linked securities, such as BTP Italias, an Italian inflation linked security containing a high watermark indexation feature. |
Brief Description of the Project | The project would be of an interest to a student considering pursuing a career in investment management. It consists of several steps allowing an intern to get exposure to all aspects of the development of an investment strategy. First the candidate would be introduced to the mathematics that govern bond and option pricing for both nominal and inflation linked securities, reviewing the relevant literature. Secondly the candidate will work closely with both the trading and quant teams to develop a model and the necessary analytics to evaluate these securities and gain an understanding as to how the traders assess them. An optional third stage of the project is to extend the analytics to handle other securities, such as UK LPI derivatives. We will be looking for a presentation of results and conclusions towards the end of the project. The project will be pursued with close cooperation of a portfolio management team. During the internship, the student will have an opportunity to learn about practical aspects of investments and risk taking from portfolio managers. |
Keywords | inflation, derivatives, options, bonds |
References | |
Prerequisite Skills | |
Other Skills Used in the Project | Statistics, Probability/Markov Chains, PDE's, Mathematical Analysis, Data Visualization |
Programming Languages | No Preference |
Work Environment | The student will work in the analytics team. There will be opportunities to talk about the project across several other teams. |
Modelling inflation expectations in financial markets (project withdrawn)
Project Title | Modelling inflation expectations in financial markets |
Contact Name | Andrey Pogudin |
Contact Email | cambridge.recruitment@symmetryinvestments.com |
Company/Lab/Department | Symmetry Investments, Quantitative Research |
Address | 86 Jermyn Street, Fourth Floor, London SW1Y 6JD |
Period of the Project | 8-12 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | |
Background Information | We are looking for an intern to work in the Quantitative Research group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations. The project focuses on the analysis and modelling of inflation expectations in the global economy in order to identify investment opportunities in financial markets. Inflation expectations is a key variable driving many financial asset prices. |
Brief Description of the Project | We are looking for an intern to work in the Quantitative Research group at Symmetry Investments, an investment management company. The project focuses on the analysis and modelling of inflation expectations in the global economy in order to identify investment opportunities in financial markets. Inflation expectations is a key variable driving asset prices both in the short and medium term. The project would be of an interest to a student considering pursuing a career in investment management. We expect the project to take place over at least 8-10 weeks in the summer 2021. The project consists of several steps allowing an intern to get exposure to all aspects of the development of an investment strategy. First, we would start with reviewing recent literature on inflation and inflation expectations modelling both by academia and market practitioners. Second, building on this review, we will construct some simple toy models starting, perhaps, with linear modelling frameworks (regression based models) and then proceeding to more sophisticated econometric approaches, including machine learning algorithms. The last step of the project is application of algorithms to actual financial datasets. The project will be pursued with the cooperation of a portfolio management team. During the internship, you will have an opportunity to learn about practical aspects of investments and risk taking from portfolio managers. |
Keywords | inflation, modelling, machine learning |
References | |
Prerequisite Skills | |
Other Skills Used in the Project | Statistics, Data Visualization, App Building, Econometrics |
Programming Languages | No Preference |
Work Environment | The student will work in the quantitative research team. |
State of the art in Covariance matrix estimation and filtering for Risk assessment
Project Title | State of the art in Covariance matrix estimation and filtering for Risk assessment |
Contact Name | Fabien Micallef |
Contact Email | cambridge.recruitment@symmetryinvestments.com |
Company/Lab/Department | Symmetry Investments, Quantitative Analytics |
Address | 86 Jermyn Street, Fourth Floor, London SW1Y 6JD |
Period of the Project | 8-12 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | |
Background Information | We are looking for an intern to work in the Quantitative Analytics group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations. This project focuses on reviewing the techniques for Covariance estimation of the returns of assets and how to update it throughout time. |
Brief Description of the Project |
The project would be of an interest to a student considering pursuing a career in investment management. Covariance matrix is a central tool in the estimation of the risk and so its estimation and filtering is very important. We would like to review the different techniques and their robustness in regards to the dimensionality, sample size. The goal is to separate the noise from real signal and filter/interpolate/update it in time. Examples of techniques to review could be Random Matrix theory, different techniques to average covariance matrices. Naive arithmetic mean of Symmetric Positive Definite matrix conduct to swelling effect. Geometric mean on SPD matrix is one technique to cope with it. Considering the vast array of techniques, the student will have to be critical about the benefits of one technique over the other. We will first test the techniques on synthetic generated and plotted data in Python and then test them on real world one. |
Keywords | covariance estimation, risk, algorithms |
References | |
Prerequisite Skills | |
Other Skills Used in the Project | Statistics, Random Matrix Theory, Differential Geometry, Lie Groups |
Programming Languages | No Preference |
Work Environment | The student will work in the analytics team. |
Fuzzy matching algorithm for live trade populations
Project Title | Fuzzy matching algorithm for live trade populations |
Contact Name | Pierre Micottis |
Contact Email | cambridge.recruitment@symmetryinvestments.com |
Company/Lab/Department | Symmetry Investments, Quantitative Analytics |
Address | 86 Jermyn Street, Fourth Floor, London SW1Y 6JD |
Period of the Project | 8-12 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | |
Background Information | We are looking for an intern to work in the Quantitative Analytics group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations. |
Brief Description of the Project | Like many other firms, Symmetry had a process by which the collateral that it holds or posts to its market counterparties is updated. This process has numerous steps but the one of interest here is when Symmetry wants or needs to reconcile its calculations with those performed by a given counterparty. In order to do so, parties exchange trade-level information. As time is of the essence, matching the trade populations produced by Symmetry and its counterparties needs to be as automated and robust as possible. It happens very often that trade details are close but not exact between counterparties so the goal of this project is to explore and implement algorithms which will look for what might be imperfect matches based on trade details but perfect matches for the trade itself. This will enable the people involved in this step of the process to focus on exceptions and errors and free valuable time up. |
Keywords | algorithms, fuzzy matching, trade reconciliation |
References | |
Prerequisite Skills | |
Other Skills Used in the Project | Programming, process design, machine learning |
Programming Languages | No Preference |
Work Environment | The student will work in a team. There will be opportunities to talk about the project across several other teams. |
Solvers for Integer Quadratic Program ("IQP") problems related to allocating trades
Project Title | Solvers for Integer Quadratic Program ("IQP") problems related to allocating trades |
Contact Name | Pierre Micottis |
Contact Email | cambridge.recruitment@symmetryinvestments.com |
Company/Lab/Department | Symmetry Investments, Quantitative Analytics |
Address | 86 Jermyn Street, Fourth Floor, London SW1Y 6JD |
Period of the Project | 8-12 weeks |
Project Open to | Master's (Part III) students |
Initial Deadline to register interest | |
Background Information | We are looking for an intern to work in the Quantitative Analytics group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations. |
Brief Description of the Project | The main objective consists in determining how the total quantity of a partially or fully-executed order should be allocated to a number of accounts of funds. It is typically preferable to do a single trade in the market and then allocate it, subject to a series of constraints. This type of problem has to be solved in such a way that each allocated trade "stand on its own", meaning that it could have been executed as such and satisfy constraints like minimum tradeable size, minimum position size, strategy-level implied ratios as close as possible to target ratios and so forth. So the solutions are typically expressed as a list of integers which minimize some objective function under constraint. Optimisations have to be done both with respect to the quantities allocated but also the Volume Weighted Average Price (or "VWAP"). |
Keywords | integer quadratic programming, algorithms, trade allocation |
References | |
Prerequisite Skills | |
Other Skills Used in the Project | Programming, solvers, algorithms |
Programming Languages | No Preference |
Work Environment | The student will work in a team. There will be opportunities to talk about the project across several other teams. |
Neural Network Model Calibration
Project Title | Neural Network Model Calibration |
Contact Name | Nicolas Leprovost |
Contact Email | nicolas.leprovost@bp.com |
Company/Lab/Department | BP, Quantitative Analytics |
Address | 20 Canada Square, London E14 5NJ |
Period of the Project | 2 to 6 months starting in summer 2021 |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Wednesday 31st March 2021 |
Background Information | To assist BP's ambition in the renewable space, it is essential to be able to model the joint evolution of power prices and renewables production. For example, modelling jointly the wind output (or equivalently the wind speed) and the electricity price is necessary to assess the cost of developing a wind farm project. This problem can be addressed by using Monte-Carlo simulations. In order to properly represent the dynamics of the underlying one needs to have a robust calibration mechanism that mimics its statistical properties. |
Brief Description of the Project |
Recent development in Machine Learning showed that Deep Learning methods could be applied efficiently to calibrate an option pricing model. During that internship, we will focus on two approaches, namely the historical calibration [1] where model parameters are estimated from historical market data and the volatility surface calibration [2] where parameters are obtained by inverting the market implied volatility surface. Those two problems will involve latest development in machine learning area such as the use of the signature function [3] or the Swish activation function [4]. To apply: https://jobs.brassring.com/1033/ASP/TG/cim_jobdetail.asp?partnerid=25078... |
Keywords | financial engineering, machine learning |
References | [1] Stone H. Calibrating rough volatility models: a convolutional neural network approach. Quantitative Finance, 20(3):379–392, 2020 [2] Bayer C, Horvath B, Muguruza A, Stemper B, Tomas M. On deep calibration of (rough) stochastic volatility models. arXiv preprint arXiv:1908.08806, 2019. [3] Chevyrev I, Kormilitzin A. A primer on the signature method in machine learning. arXiv preprint arXiv:1603.03788, 2016 [4] Ramachandran P, Zoph B, Le QV. Swish: A self-gated activation function. arXiv 2017. arXiv preprint arXiv:1710.05941. |
Prerequisite Skills | Statistics, Probability/Markov Chains |
Other Skills Used in the Project | Simulation |
Programming Languages | Python |
Work Environment | Depending on regulations at the time, we hope you will be able to work in the office. You will be assigned a project supervisor and will take part in weekly team meetings. |
Segmenting duodenal biopsy images
Project Title | Segmenting duodenal biopsy images |
Contact Name | Julian Gilbey |
Contact Email | jdg18@cam.ac.uk |
Company/Lab/Department | Lyzeum Ltd. / DAMTP |
Address | jdg18@cam.ac.uk |
Period of the Project | 8 weeks between late June and September |
Project Open to | Undergraduates, Master's (Part III) students |
Initial Deadline to register interest | Monday 29th March 2021 |
Background Information | Coeliac disease is an autoimmune condition triggered by exposure to gluten (in wheat and other grains), and it can cause significant long-term harm if left untreated. Treatment is a lifelong gluten-free diet. This condition is estimated to affect about 1% of the UK population, but is very under-diagnosed; probably only 1 in 5 or 1 in 6 sufferers is aware that they have it. The gold standard for diagnosis is to perform a biopsy and to look for signs of the disease process on the tissue. This requires highly-trained pathologists to look at each biopsy and to assess it for disease. There is a shortage of pathologists in the UK, and there is often disagreement between pathologists on the diagnosis of individual tissue samples. The long-term aim of our work is to develop a method for obtaining a diagnosis from a tissue sample in an automated fashion, either to guide pathologists in their work or to save the need for a pathologist to look at every sample. |
Brief Description of the Project | One of the challenging parts of this work is dealing with very large and varied microscope images and identifying the different small-scale and large-scale structures present. Some techniques have already been developed for this, but they are usually effective for only one scale. In our case, we need to use some large-scale information to inform the small-scale identification, and possibly vice-versa. The purpose of this summer project is to explore some of the existing state-of-the-art techniques and to see how they can be combined, adapted and/or developed for our needs. A successful outcome would be a tool for performing this identification. (Note that in the literature, this process is called "segmentation".) |
Keywords | Deep learning, neural networks, image analysis, digital pathology, coeliac disease |
References | - An introductory seminar on this work is available at: https://cambridgelectures.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=... and a related seminar on the biology of coeliac disease and a bioinformatics approach is here: https://cambridgelectures.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=... (These can be accessed from the cam.ac.uk domain or using a Raven account.) - For learning PyTorch, the fastai course (https://github.com/fastai/fastbook) is very helpful. - There are also many papers available on digital histopathology that are potentially relevant, and the Coeliac UK website gives more information about the condition. |
Prerequisite Skills | Image processing, neural networks and deep learning; any other mathematical skills are also potentially useful. |
Other Skills Used in the Project | |
Programming Languages | Python, We are using PyTorch in our work; this can be learnt during the course of the project. |
Work Environment | We are currently a small team (of 2 plus an MPhil student!) all working from home, and meet very regularly over Discord or Zoom. If the COVID-19 situation allows it, we might be able to meet in person in Cambridge or London on occasion as well, but there are no specified working hours or location for working. Note that you must have the right to work in the UK to be eligible for this project; you do not have to be currently based in the UK, though. |