skip to content

Summer Research Programmes

 

List of all projects with keywords (click link for full listing)

 

Spectral estimation for irregularly-sampled complex-valued time series

Project Title Spectral estimation for irregularly-sampled complex-valued time series
Contact Name Keith Briggs
Contact Email keith.briggs@bt.com
Company/Lab/Department BT Labs Wireless Research
Address Adastral Park, Martlesham Heath, Ipswich IP5 3RE
Period of the Project 8 weeks
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information In the modelling of 5G radio channels, we take measurement of complex time series (which are channel matrix elements), but the measurement process unavoidably takes these samples at irregular (but known) times. We wish to explore methods for estimation of power spectral density (PSD) for such data and to understand better their sampling properties. The overall application area is the better estimation of channel matrices, in order to improve performance of 5G radio systems. The channels are described by matrices because the systems are MIMO (multi-in multi-out), effectively using a vector channel. Also of interest is estimation of autocorrelation of these matrices, and PSD has been viewed as a step towards this. The whole topic fits well under the harmonic analysis heading in the CMI mission statement.
Brief Description of the Project

To estimate power spectral density from irregularly-sampled complex data, we are currently using a kind of generalized Lomb-Scargle periodogram (LSP). However, the theory and sampling properties of this estimator are not well understood. Appropriate theoretical background is available in Percival & Walden, Spectral analysis for univariate time series (CUP 2020), p.528ff. This project could tackle one or more of these items:

1. The LSP can be viewed as a generalized discrete Fourier transform (DFT), in other words a matrix-vector product in which row-columm dot product is the projection of the data onto a basis vector of the model. In the LSP the matrix elements do not have as many nice properties as the DFT matrix. We can speak of the Lomb-Scargle Transform (LST), of which the LSP is simply the modulus squared.
2. Check that the standard theory for DFT estimation properties (e.g. for AR(n) processes) still holds for LST. This is almost certainly the cases and this step may be trivial.
3. Given that input times are fixed, is it possible to make sense of the idea of an optimal set of output frequencies?
4. Examine tapering methods (as used for DFT) for the LST case, to determine the possible improvements to estimate accuracy.
5. Can we get estimates of autocorrelation from the LST? This would be very useful in practice.
6. (Probably hard): there is no known concept of Fast Fourier Transform (FFT) for the LST. Is anything possible in this direction? A special case would be of interest and some sort of FFT may be possible: let us allow two (and only two) time intervals between measurements in the time series data. This would be approximately satisfied by our data. (The underlying mathematics for the usual FFT involves decompositions of finite groups, so having a group theorist interested to help with this would be needed.)

Keywords Spectral estimation, irregular sampling, complex-valued, time series
References Percival & Walden, Spectral analysis for univariate time series (CUP 2020), p.528ff.
Prerequisite Skills Statistics, Probability/Markov Chains, Simulation
Other Skills Used in the Project Statistics, Probability/Markov Chains, Simulation
Programming Languages Python, C++
Work Environment Mostly working with me, with a wider team available if needed. Flexible hours, on-site preferred but remote possible.

 

Capturing information from operating theatres

Project Title Capturing information from operating theatres
Contact Name Rosemarie Gant / Kim Whittlestone
Contact Email admin@healthdatainsight.org.uk
Company/Lab/Department Health Data Insight CIC
Address CPC4 Capital Park, Fulbourn, Cambridge CB21 5XE
Period of the Project 8-12 weeks starting 28th June 2021
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 19th February 2021
Background Information A recent review by Baroness Cumberlege (https://www.immdsreview.org.uk/Report.html) into the complications that can occur following implanting medical devices has recommended that data on all implanted medical devices should be collected at the time of the operation. Device manufacturers are required to put unique bar or QR codes on every device and it will soon become a requirement for hospitals to record this information on all devices implanted into patients so that patients can be followed up months or years later. The National Theatre Dataset is currently being developed to hold this information so we know about all procedures carried out in UK hospital theatres.
Brief Description of the Project

This year's project aims to develop a simple hardware and software solution to capture the information from operating theatres about implanted medical devices, operations and procedures. The intention is to create a low-cost solution that could be used by the NHS to help to gather data into the new National Theatre Dataset.

The project will be run by Health Data Insight CIC and we will be able to draw on close links with expert colleagues working in the NHS and NHS Digital.

We are offering up to six intern places on this group project in 2021. Interns will work together as a multi-disciplinary team, bringing a diverse range of skills and experience to develop the project, from specification to final completion.

This project has a number of components:
1) development and evaluation of a simple and low-cost solution using existing cheap barcode scanners to capture relevant information from operating theatres.
2) exploration of ways of capturing additional information about the operation, those involved, the location of the operating theatre and relevant information from the anaesthetist - for example the time the operation started, how long it lasted etc.
3) a secure method of storing and sending encrypted information to NHS Digital so that it can be incorporated into the National Theatre Dataset where it will be linked to other patient-level information.
4) creation of resources to facilitate data collection in operating theatres - for example posters with relevant bar codes and procedures that can be scanned
5) basic analysis and visualization of the data collected

Who are we looking for?
The project will have input from clinical colleagues working in peri-operative medicine and the teams in NHS Digital responsible for collection of the National Theatre Dataset and device classification. We are looking for enthusiastic students with a diverse range of skills to be part of an innovative multi-disciplinary team: skills and experience will include topics such as hardware devices, methods for local data capture and storage; data analytics, data presentation and visualization.

What skills/experience do I need?
Enthusiasm and eagerness to work as part of a small team of like-minded individuals are the main attributes we are after. Candidates from all backgrounds are welcome, particularly if you have interest or experience in one or more of these areas:  
- use of QR or 2-d codes for information storage and transmission
- data encryption
- geolocation to room level
- Bluetooth communication
- capturing data from medical equipment (rs232 data standards etc)
- visual and science communication
- stakeholder engagement

How do I apply?
To apply, please send an up-to-date CV and a covering letter outlining what skills and attributes you would bring to the intern project and any background experience that you feel is relevant to admin@healthdatainsight.org.uk

What is the deadline for applications?
The closing date for applications is midnight on Friday 19th February 2021. After short-listing we will let you know if we require any more information and/or whether we would like you to attend an informal interview. All applicants will be notified whether their application has been successful after this process has completed.

Keywords medical, data, encryption, QR-codes, implants
References If you would like to see what our team got up to last year: https://healthdatainsight.org.uk/running-an-intern-scheme-in-a-global-pa... and https://healthdatainsight.org.uk/project/syndasera/
Prerequisite Skills Enthusiasm and eagerness to work as part of a small team of like-minded individuals are the main attributes we are after.
Other Skills Used in the Project  
Programming Languages No Preference
Work Environment As well as being a team project, this internship is a chance to join a thriving and enthusiastic community of bright individuals (see https://healthdatainsight.org.uk/category/team/). The team will be supported by an Intern Team Lead, with specialist input from developers, project managers, analysts, science communicators and many other professionals. This internship is about developing specialist skills and also a chance to enhance your communication, collaboration, organisational and team-working skills. The normal working week is 37.5 hours; we offer a salary of £1,500 per month, flexible working and 2.5 days leave per month. Interns will meet regularly to discuss their progress on the project and the Intern Team Lead will always be available either in person or online for queries and support. If permitted by COVID rules, the interns will work in the HDI offices in Capital Park, Fulbourn, Cambridge although travel to other sites may be necessary as part of the internship. If remote working is necessary, we have the setup required to do this.

 

Optimizing deep neural networks for speech processing application: a parametric approach

Project Title Optimizing deep neural networks for speech processing application: a parametric approach
Contact Name Cong-Thanh Do
Contact Email cong-thanh.do@crl.toshiba.co.uk
Company/Lab/Department Toshiba Europe Limited 
Address Toshiba Europe Limited, 208 Cambridge Science Park, Milton Road, Cambridge CB4 0GZ
Period of the Project 8-12 weeks between late June and September
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information

Nowadays deep neural networks (DNNs) are widely used in speech processing applications, from hearing aids to personal assistants on mobiles phones. Conventional wisdom tells that the deeper and wider the neural network models are, the higher performance the system can achieve, and this is generally true. However, the DNNs' complex architecture can be effectively optimized in order to improve the performance as well as reducing the number of parameters. Indeed, larger models cannot be implemented in hardware with limited memory and computational power. Therefore, optimizing the structure of DNNs is an active research direction.

Various methods have been proposed to optimize DNN architecture and for model size reduction, such as pruning redundant models or exploiting the sparsity of rectifier activation function (ReLU) to reduce the computational load of convolution [1]. The ReLU activation function enables a network to easily obtain sparse representations [2].

Brief Description of the Project

Achieving sparsity in neural networks is one of the necessary conditions that the models can be reduced in size while maintaining the same level of performance by eliminating the zero weights after training the DNNs [1]. Sparsity in DNNs can be achieved by various approaches, for instance by using sparse evolutionary training (SET) algorithm [3].

In this project, we will investigate the way to achieve sparsity by studying activation functions for DNNs. More specifically, we study the use of splines in the activation functions of a deep neural networks. Spline-related parametric models were studied to optimize the shape of neural activation units [4, 5, 6]. The use of parametric splines makes it possible to establish a direct connection between training DNNs and activation functions and the resulting sparsity. In [5], the author showed that the optimal network configuration can be achieved with activation functions that are nonuniform linear splines with adaptive knots. The study’s significance is that the action of each neuron is encoded by a spline whose parameters are optimized during the training procedure. The proposal resulted in a computational structure that is compatible with deep-ReLU, parametric ReLU [7], and MaxOut structure.

In our work, we will focus on using sparsity as one of the constraints for training DNNs with parametric activation function, in particular using the spline model proposed in [6]. Achieving sparsity could result in improved performance and sparse weights which is useful for performance improvement and model size reduction.

Keywords Deep learning, optimization, speech processing, parametric, sparsity
References [1] A. Yaguchi, T. Suzuki, W. Asano, S. Nitta, Y. Sakata, A. Tanizawa, "Adam induces implicit weights sparsity in rectifier neural networks", in Proc. 17th IEEE International Conference on Machine Learning and Applications, pp. 318-325, Dec. 2018.
[2] X. Grolot, A. Bordes, Y. Bengio, "Deep sparse rectifier neural networks", in Proc. 14th Conference on Artificial Intelligence and Statistics (AISTATS), pp. 315-323, Apr. 2011.
[3] D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, A. Liotta, "Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science", Nature Communications, 2018.
[4] F. Agostinelli, M. Hoffman, P. Sadowski, P. Baldi, "Learning activation functions to improve deep neural networks", in Proc. International Conference on Leaning Representation (ICLR), 2015.
[5] L. Vecci, F. Piazza, A. Uncini, "Learning and approximation capabilities of adaptive spline activation function neural networks", Neural Networks, vol. 11, no. 2, pp. 259-270, 1998.
[6] M. Unser, "A representer theorem for deep neural networks", Journal of Machine Learning Research, vol. 20, pp. 1-30, 2019.
[7] K. He, X. Zhang, S. Ren, J. Sun, "Delving deep into rectifiers: surpassing human-level performance on ImageNet classification", in Proc. International Conference on Computer Vision (ICCV), pp. 1026-1034, Dec. 2015.
Prerequisite Skills Statistics, Probability/Markov Chains, Simulation
Other Skills Used in the Project Numerical Analysis, Image processing
Programming Languages Python, MATLAB, C++, No Preference
Work Environment

The student will work in a team. Besides the main supervisor (Dr. Cong-Thanh Do), there will be a co-supervisor (Dr. Rama Doddipatla) to which the student can talk to about the project.

The working hours of the lab are 9am-5pm. Given the current situation regarding the coronavirus, working remotely is acceptable. Access to the office could be considered if necessary and according to the situation.

 

Advanced image analytics for drug discovery

Project Title Advanced image analytics for drug discovery
Contact Name Sara Schmidt
Contact Email sara.x.schmidt@gsk.com
Company/Lab/Department GSK
Address Gunnels Wood Road, Stevenage SG1 2NY
Period of the Project 8-10 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information

GSK is a FTSE100, science-led, global healthcare business currently ranked as the leading pharmaceutical company in the UK. Our research is focused on immunology efforts including small molecules, biologics and vaccines. Never has it been more important for us to reduce the time it takes from identification of a potential therapeutic to a marketed medicine, that is the main focus of this project.

At GSK we have created a world-leading data and computational environment to enable large scale scientific experiments that exploit GSK's unique access to data. Our focus is on bringing data, analytics, and science together into solutions for our scientists to develop medicines for patients. A key enabler of this effort is the ability to extract knowledge from imaging data. A specific challenge for the early phase of drug discovery programmes is to assign potential drug molecules into those with a desired effect of the drug target in mind and those with an undesirable effect, e.g. toxicity.

One way to achieve this goal is by developing advanced image analytics algorithms, where image sets of cells in the presence of molecules with known undesirable mechanisms are used to define image signatures, the so-called "ground truth". Thereafter the algorithm is applied to unknown compounds to allow us to focus on compounds that are free from potential liabilities, thereby improving drug failure rates and overall speed up the often lengthy and costly drug discovery process.

Brief Description of the Project We are looking for a student with a keen interest in image processing and computer vision that can use our in-house generated image stacks of cells from early drug discovery programmes and associated training sets to develop image analytics algorithms that enable compound mechanism classification. The project will involve both improving existing, and the development of new, image analytics algorithms in open source packages (e.g. Python, Cellprofiler and Ilastik). Upon choice and validation of a suitable algorithm the student should develop a robust pipeline that can be used by scientists to analyse their own data at scale, in a way that minimises data integrity risks.
Keywords Image processing, Computer vision, Machine Learning, Bioimaging, Pharmaceutical industry
References Bray MA et.al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc. 2016 Sep;11(9):1757-74. doi: 10.1038/nprot.2016.105. Epub 2016 Aug 25. PMID: 27560178; PMCID: PMC5223290.
Prerequisite Skills Image processing
Other Skills Used in the Project Statistics, Data Visualization
Programming Languages Python, R
Work Environment Fully embedded into a scientific department and part of a wider team interacting with data scientists and imaging experts.

 

Meta-Analysis of Transcriptomics data at GSK

Project Title Meta-Analysis of Transcriptomics data at GSK
Contact Name Giovanni Dall'Olio
Contact Email giovanni.m.dallolio@gsk.com
Company/Lab/Department GSK
Address Gunnels Wood Road, Stevenage SG1 2NY
Period of the Project 8 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information

In recent years GSK has invested in the creation of a Data Lake, an infrastructure where all the data generated in the company is stored and made available. This has many advantages, as the data is not scattered across data silos, and it is generated using a standardized and controlled process.

One component of this Data Lake infrastructure is the pipeline for the sequencing of genomics and transcriptomics data (RNA-Seq and other technologies). We have built a process to generate and curate this data using standard tools and parameters, generating a high-quality dataset from experiments executed from different departments in the company.

The scope of the research project will be to develop methods for meta analysis of the genomics and transcriptomics data in this dataset, comparing experiments generated by different units and collaborators.

Brief Description of the Project

The student will explore methods for meta-analysis of sequencing data from different experiments. The desired outcome of the project will be a computational notebook or a script documenting recommendations for meta-analysis methods, taking in consideration existing literature, and using example data from our dataset.

This project is relatively open-ended and the student will have space to explore different solutions, as well as working with a curated dataset. Knowledge of NGS is not required although some preliminary understanding may be useful. Preferred programming languages would be R and Python.

Keywords transcriptomics, genomics, RNA-seq, meta-analysis, statistics
References

- Leek et al, Nat Rev Gen 2010. Tackling the Widespread and Critical Impact of Batch Effects in High-Throughput Data
- Evans et al, Brief Bioinf 2018. Selecting Between-Sample RNA-Seq Normalization Methods From the Perspective of Their Assumptions
- RPKM, FPM and TPM clearly explained https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/

Prerequisite Skills Statistics
Other Skills Used in the Project Database Queries, Data Visualization
Programming Languages Python, R
Work Environment Work remotely, in a team. 

 

Inhalation Dosimetry Modelling

Project Title Inhalation Dosimetry Modelling
Contact Name George Fitton
Contact Email george.fitton@unilever.com
Company/Lab/Department Unilever SEAC Computational Science
Address SEAC, Unilever, Colworth Science Park, Sharnbrook, Bedford MK44 1LQ
Period of the Project 8 weeks
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information

In daily life we use consumer products: household cleaning products, anti-perspirants, hairsprays, etc., that produce unintentional exposure to chemicals. Unilever assures consumer safety by assessing the toxicity risk of every ingredient in every product. 

Current risk assessments use historical rodent studies [1]. Next Generation Risk Assessments (NGRAs) use New-Approach Methodologies (NAMs) that leverage advances in computing, genetics, and statistics – novel in silico and in vitro approaches that assess risk without testing on animals.

Over the past few decades, Unilever’s Safety and Environmental Assessment Centre (SEAC) has worked closely with Industry, Academia, and Regulatory Bodies to develop a wide range of non-animal approaches using mathematical modelling [2], [3], cell culture-based experiments [4], [5], and omics for systemic and local toxicity [6].

The aim of this short project is to help advance current inhalation risk assessment methods.

Brief Description of the Project

The overall goal of any Next Generation Risk Assessment is to get from an in vitro Point of Departure (PoD) to a relevant in vivo exposure level – getting from exposures (or concentrations) of chemicals with no toxicological response in cell models to exposures with no toxicological response in consumers and vice versa.

Obtaining an in vitro PoD from an inhaled in vivo exposure requires an Inhalation Dosimetry Model. Inhalation Dosimetry Models calculate the fraction of the total number of inhaled particles deposited in the lung – a mass per volume metric. The in vitro PoD is obtained by distributing the number of deposited particles over the surface of the lung [7].

Current industry standards use the free but closed-source Multiple-Path Particle Dosimetry (MPPD) Model [8]–[10] to calculate the deposition fraction. The MPPD Model is user friendly and well-tested. But its dynamics are based on laminar fluid flow in a pipe – the lungs are modelled as a series of branching connected tubes. This means a lot of complex phenomena: wall impacts, convection, turbulent mixing, etc. are absent from the model. 

The problem statement follows – Investigate and quantify in-vitro PoD uncertainty due to in-vivo modelling simplifications / approximations.

Advances in computational power have meant that advanced fluid dynamics simulations can be performed on most computers. Consequently, the modelling approaches of previous generations need to be revised. A significant amount of effort has been invested in advanced aerosol modelling for pharmaceutical companies, with the results published in the public domain and the software released as open-source [11].

Given the current SoA we see 3 potential directions of investigation; the student is free to tackle the problem as he/she sees fit.

1) Data-driven Approach: review the results from the current State-of-Art (SoA) in aerosol deposition simulations and build a data model to estimate the uncertainty in deposition fraction estimates. Currently available data may differ in particle size, composition (dust vs droplet), breathing patterns (smoking versus unintentional inhalation), etc. Given the available data, the student is free to use a variety of statistical techniques to achieve the required goal.

2) Computational Approach: AeroSolved is an open-source computational fluid dynamics model based on the open-source OpenFoam code base [12]. Its features include simulations of mass, momentum (Navier-Stokes equations), and energy conservation equations; Multispecies formulation for gas (vapor), liquid (droplet) phases and solid particles; and advanced aerosol physics models. Benchmarking the MPPD Model against the Eulerian AeroSolved aerosol deposition Model will provide a numerical estimate of the deposition fraction uncertainty.

3) Analytical Approach: The Equations governing the laminar-flow transport of aerosols in the MPPD Model are known and can be compared to the more complex Navier-Stokes transport model. Using the characteristic scales of the problem, a scale analysis of the (non-linear) term difference and their corresponding transport equations will yield an upper bound on the uncertainty of the deposition fraction. More generally, derive a scaling Equation for the deposition fraction for laminar and turbulent fluid flow.

The aim is to determine a numerical estimate of the uncertainty in Deposition Fraction calculations in Laminar Flow Deposition Models, if possible, by benchmarking against State-of-Art Deposition Models.

Keywords CFD, Statistics, Modelling, Toxicology, Risk
References [1] W. Steiling et al., "Principle considerations for the risk assessment of sprayed consumer products," Toxicol. Lett., vol. 227, no. 1, pp. 41-49, 2014.
[2] J. Reynolds, S. Malcomber, and A. White, "A Bayesian approach for inferring global points of departure from transcriptomics data," Comput. Toxicol., p. 100138, 2020.
[3] J. Reynolds et al., "Probabilistic prediction of human skin sensitiser potency for use in next generation risk assessment," Comput. Toxicol., vol. 9, p. 100138, 2020.
[4] M. T. Baltazar et al., "A next generation risk assessment case study for coumarin in cosmetic products," Toxicol. Sci., 2020.
[5] S. Hatherell et al., "Identifying and Characterizing Stress Pathways of Concern for Consumer Safety in Next-Generation Risk Assessment," Toxicol. Sci., 2020.
[6] T. E. Moxon et al., "Application of physiologically based kinetic (PBK) modelling in the next generation risk assessment of dermally applied consumer products," Toxicol. Vitr., vol. 63, p. 104746, 2020.
[7] S. Gangwal et al., "Informing selection of nanomaterial concentrations for ToxCast in vitro testing based on occupational exposure potential," Environ. Health Perspect., vol. 119, no. 11, pp. 1539-1546, 2011.
[8] S. Anjilvel and B. Asgharian, "A multiple-path model of particle deposition in the rat lung," Fundam. Appl. Toxicol., vol. 28, no. 1, pp. 41-50, 1995.
[9] O. T. Price, B. Asgharian, F. J. Miller, F. R. Cassee, and R. de Winter-Sorkina, "Multiple Path Particle Dosimetry model (MPPD v1. 0): A model for human and rat airway particle dosimetry," RIVM Rapp. 650010030, 2002.
[10] F. J. Miller, B. Asgharian, J. D. Schroeter, and O. Price, "Improvements and additions to the multiple path particle dosimetry model," J. Aerosol Sci., vol. 99, pp. 14-26, 2016.
[11] P. Koullapis et al., "Regional aerosol deposition in the human airways: The SimInhale benchmark case and a critical assessment of in silico methods," Eur. J. Pharm. Sci., vol. 113, pp. 77-94, 2018.
[12] E. M. A. Frederix, Eulerian modeling of aerosol dynamics. University of Twente, 2016
Prerequisite Skills Statistics, Simulation, Data Visualization
Other Skills Used in the Project Mathematical physics, Fluids
Programming Languages Python
Work Environment Part of the Computational Science team in SEAC

 

Verification of stress simulation model/software

Project Title Verification of stress simulation model/software
Contact Name Artem Babayan
Contact Email artem.babayan@silvaco.com
Company/Lab/Department Silvaco Europe
Address Silvaco Europe Ltd, Compass Point, St Ives PE27 3FJ
Period of the Project 8 weeks, any time
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information  
Brief Description of the Project

Silvaco develops Electronic Design Automation (EDA) and Technology CAD (TCAD) software. One of the modules is the tool for simulating stresses inside the electronic devices (caused by bending, heating, internal stresses etc. during device manufacture).

Your task would be:
- research the standard stress problems for which analytical solution is available.
- set up the problem within Silvaco Stress Simulator -- compare the known analytical solution with the results obtained with simulator.
- optionally, you may also set up a simple problem which can be solved numerically using alternative tools (e.g. Matlab) and compare these results with Silvaco. Depending on results of the project it may result in academic paper or conference publication.

Keywords Stress simulation, Mathematical modelling, Model verification, Numerical analysis
References  
Prerequisite Skills Mathematical physics, Numerical Analysis, PDEs
Other Skills Used in the Project  
Programming Languages Python, MATLAB, C++
Work Environment The student will be placed in the Silvaco building in St Ives. There are ~15 people in the office. Student is supposed to work on his/her own with advice available from the team. Also communication with our office in US may be required.

 

Projects in Quantitative/Systematic investing

Project Title Projects in Quantitative/Systematic investing
Contact Name Beth Duncan
Contact Email bduncan@bluecove.com
Company/Lab/Department BlueCove
Address 10 New Burlington Street, London W1S 3BE
Period of the Project 12 weeks from 14th June to the 10th September 2021, flexible
Project Open to Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information

We're looking for talented individuals to join our Research team as Summer Research Analysts to help us achieve our mission. You will contribute to the development of innovative research. We don't expect you to have detailed knowledge of the asset management space, though experience in Finance or Economics is useful. What we are looking for is an intellectual curiosity that will drive you to learn as much as you can whilst you're here, together with a background in quantitative analysis and strong programming skills.

To give you a little background, BlueCove is a scientific asset management firm founded in 2018. Here, we believe that scientific active investing, as an alternative and complement to both passive and traditional active investing, is set to be the next defining development for the fixed income industry. As one of our Research Analysts, you will join us for 12 weeks from 14th June to the 10th September 2021 (but we can be flexible on dates and the length of the programme). As well as learning about scientific fixed-income products and the asset management industry overall, you will spend your time working on an important research project.

Brief Description of the Project

The BlueCove Research team have a number of interesting projects to undertake. The research projects our Summer Research Analysts undertake are likely be in the following areas:
- Natural language processing (NLP) techniques applied to company-level datasets. Goal/outcome: Improving sentiment signals for use in our fixed income systematic strategies
- Analysis of Environmental, Social, Governmental (ESG) datasets to determine quality, similarity to other data sources, & usefulness for investment decisions. Goal/outcome: A better understanding of the pros/cons of various ESG data sources
- Construction of proprietary analytic measures for derivative instruments (such as options on bond futures). Goal/outcome: A set of thoroughly tested return & greek exposure calculations which can be used to form the basis of a systematic investment strategy.

Keywords Finance Data analysis Python Scientific approach
References To find out more about our firm, please take a look at our website - https://www.bluecove.com/
Other references will be supplied to selected candidates
Prerequisite Skills Statistics, Data Visualization, Data analysis/data science; strong coding skills; disciplined/scientific approach
Other Skills Used in the Project Quantitative finance
Programming Languages Python, MATLAB, R
Work Environment You will work with our team of 8 Researchers, as well as your fellow Summer Research Analysts and the broader Investment Team. But our firm is new and collaborative, so you'll gain exposure to all our departments and work with many people.

We are flexible on whether you work remotely, in the office, or a hybrid of both. We compensate our interns at a market-level salary for their valuable work and you will be eligible for benefits including private medical health insurance & a virtual GP.

As one of our Summer Research Analysts, you will also benefit from:
- Weekly firmwide meetings, in addition to team meetings, so you'll learn more about the business overall
- Regular firmwide colloquia and access to other learning & development opportunities
- Agile & flexible working
- Our Employee Wellbeing Programme
- Modern technology
- The chance to broaden your network through regular firm and team work and social events
- Our modern office which is ranked "Excellent" by BREEAM
- No prescriptive dress code
- A generous holiday allowance

 

Low-rank matrix approximations within Kernel Methods

Project Title Low-rank matrix approximations within Kernel Methods
Contact Name Zdravko Zhelev
Contact Email application@dreams-ai.com
Company/Lab/Department DreamsAI
Address 30 Meade House, 2 Mill Park Rd, Cambridge CB1 2FG
Period of the Project 8 weeks, flexible
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information In machine learning, we often employ kernel methods to learn about more general relation in datasets instead of explicit projection to avoid high computational cost.
Brief Description of the Project Often kernel trick involves computation of matrix inversion or eigenvalue decomposition and the cost becomes cubic in the number of training data cause. Due to large storage and computational costs, this is impractical in large-scale learning problems. One of the approaches to deal with this problem is low-rank matrix approximations. The most popular examples of them are Nyström method and the random features. We would like student to test out the feasibility of these approximations on real data.
Keywords machine learning linear algebra mathematical statistic
References https://papers.nips.cc/paper/1866-using-the-nystrom-method-to-speed-up-k...
https://people.eecs.berkeley.edu/~brecht/paper/07.rah.rec.nips.pdf
https://stanford.edu/~jduchi/projects/SinhaDu16.pdf
Prerequisite Skills Statistics, Numerical Analysis, Mathematical Analysis, Geometry/Topology, Predictive Modelling
Other Skills Used in the Project Data Visualization
Programming Languages Python, C++
Work Environment Project supervisor will provide 5 hours out of the 30 hours working time at the office in Cambridge. Good student will also be offered free trip to Hong Kong to take on more maths projects.

 

Prize pool and odds forecast

Project Title Prize pool and odds forecast
Contact Name Zdravko Zhelev
Contact Email application@dreams-ai.com
Company/Lab/Department DreamsAI
Address 30 Meade House, 2 Mill Park Rd, Cambridge CB1 2FG
Period of the Project 8 weeks, flexible
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information  
Brief Description of the Project In a prize pool based betting game, the final returning odds of a bet is simply a function of the total amount of bet placed by everybody divided by the total amount of bets that guessed correctly. Therefore every time someone placed a bet, the odds for every bet type change for everybody. Only after the deadline for bet placing can the odds be finalized. In theory, if you know all of the prize pool's size you can determine all the odds exactly, and vice versa. The challenge here is to consider the cases when we only know a subset of the odds/prize pool's size; how much uncertainty would be introduced and can we leverage the relationships between bet types to improve our predictions.
Keywords combinatorics probability markov chain monte carlo
References

https://papers.nips.cc/paper/1866-using-the-nystrom-method-to-speed-up-k...
https://people.eecs.berkeley.edu/~brecht/paper/07.rah.rec.nips.pdf
https://stanford.edu/~jduchi/projects/SinhaDu16.pdf

Prerequisite Skills Statistics, Probability/Markov Chains, Numerical Analysis, Simulation, Predictive Modelling
Other Skills Used in the Project Database Queries
Programming Languages Python, C++
Work Environment There will be be about 30 hours of work expected at our Cambridge office, 5 of which will be supervised. Strong candidate will be offered free trips to Hong Kong to pick up potentially another project to do during an internship or part-time.

 

Card Gaming AI

Project Title Card Gaming AI
Contact Name Zdravko Zhelev
Contact Email application@dreams-ai.com
Company/Lab/Department DreamsAI
Address 30 Meade House, 2 Mill Park Rd, Cambridge CB1 2FG
Period of the Project 8 weeks, flexible
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information Building an AI that can compete with humans at a popular Chinese card game.
Brief Description of the Project

A popular Chinese card game requires 4 players; each with 13 of the 52 cards. The goal of the game is to arrange the 13 cards in 3 sets of 3, 5 and 5. Each set is then compared with the corresponding sets belonging to the other players, and the best set in each group wins.

In this project, we want the student to investigate one or more of the following questions:
1. Performance vs computational complexity of a hard decision logic based AI
2. Performance vs computational complexity of a deep reinforcement learning based AI
3. How accurately can we predict our chances of winning based on the information that is already revealed?

Keywords combinatorics, probability, neural network, simulations
References  
Prerequisite Skills Probability/Markov Chains, Simulation
Other Skills Used in the Project Statistics, Data Visualization
Programming Languages Python, C++, Rust
Work Environment Project supervisor will provide 5 hours out of the 30 hours working time at the office in Cambridge. Good student will also be offered free trip to Hong Kong to take on more maths projects.

 

Using mathematical techniques to assist in the continuous improvement process of a cut flower manufacturing operation

Project Title Using mathematical techniques to assist in the continuous improvement process of a cut flower manufacturing operation
Contact Name David Booth
Contact Email david.booth@mm-flowers.com
Company/Lab/Department MM Flowers
Address Pierson Road, The Enterprise Campus, Alconbury Weald PE28 4YA
Period of the Project 8 weeks
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information MM Flowers established 14 years ago, is the UK's leading, integrated cut flower supplier, with a unique ownership model and innovative practices. MM Flowers is owned by the AM FRESH Group, a leading international breeder, grower and distributor of citrus and grapes; VP, East Africa's largest cut rose and vegetable producer, also incorporating and in-house breeding arm; Nature's Rights; together with The Elite Flower Ltd, the world's largest cut flower business, including breeding, propagation and growing operations throughout South America and Kenya, alongside extensive processing and distribution businesses throughout North America. MM Flowers imports, processes and distributes cut flowers to many of the major high street retail brands in the U.K. including Marks & Spencer and Tesco either to retail stores or directly to consumers homes. The UK cut flower market is extremely challenging, where consumers expect high quality flowers at very competitive prices. The vast majority of species utilised are perishable, short life products, transported from many different regions around the world every day. Pre- and post-harvest management, logistics and environmental control are all factors positively or negatively impacting ultimate flower quality and therefore consumer satisfaction. MM Flowers receives circa 500 million stems of cut flowers annually across at least 60 different species and hundreds of varieties. There is a dramatic increase in output during periods such as Christmas, Valentine's and Mother's Day where the business processes millions of stems over extended storage and packing periods.
Brief Description of the Project

MM has grown steadily over the last 14 years into a multi-million-pound business. As with any rapidly growing business, it is essential that with increased sales, consistency of quality and service level must be maintained and increasing scale requires progressive thinking and novel approaches to succeed. Cut flowers, whilst superficially a non-essential product, are an emotionally driven purchase and therefore the highest standards must be maintained. Historically the fresh produce industry, including the flower industry, has been slow and inefficient in using the vast amount of data that is generated to inform decisions within their businesses. MM recognised this need several years ago, and as it's growth has continued, so the need to generate and use data to inform sound decision making has intensified. The COVID-19 pandemic has intensified this need even further. During 2020 the demand for online purchasing has skyrocketed as consumers change their shopping behaviours to avoid contracting the virus. This presented numerous challenges and opportunities for MM to shape and adapt the next stage of growth. Data has been and will be at the heart of successfully adapting to the constant changes that all of us are subject to currently. Insight from data collection, enables continuous improvement, basing both strategic and tactical approaches on quantifiable data and robust analysis, which should ultimately allow MM to continue its growth trajectory successfully.

An example of how MM ensures product quality is through the operational department meeting required standards for bouquet production. A dedicated quality team undertake daily inspections of the flowers from the point of receipt and throughout the manufacturing process to support the operational delivery. MM also utilises it own dedicated R&D business, APEX Horticulture, to complement the operational efforts and help provide solutions to maintain or enhance flower quality.

Whilst APEX and the intake quality function have long term, established data sets, dedicated data collection during bouquet manufacturing is a relatively new venture, requiring insight and improvement from a talented student with fresh ideas and an aptitude for data analysis. As such, there is the possibility to develop processes to allow for future data to be incorporated and analysed more efficiently across the Technical function (and indeed the wider business), allowing for quicker and more accurate decision making. Whilst this placement will focus on production, the scope of the project is wide, with opportunities for the successful student to improve processes for data collection and analysis across both quality control and quality assurance.

During this project, the prospective student will have access to extensive data, including quality assessments on receipt of the flowers, operational quality assurance, and retailer consumer metrics, for example. These datasets present an opportunity to undertake more detailed analysis of long-term trends, and how various factors influence the end consumer quality and performance. Previous CMP placements within the business have highlighted several trends and added real value to the business and changed our working practices; this is therefore an opportunity to make a real difference and see your work applied in industry. The successful student will spend their placement within the technical department, which focuses on quality and customer relations, but will experience all aspects of a fast-paced, fresh produce business. This placement will also include liaising with different departments, project management, communication skills, and working towards the needs of the business.

Skills Required: Strong computer skills, Experience with statistics and modelling, Clear communicator, Self-motivated, Demonstrates initiative, Project management, Problem solving, Industry focussed.

Keywords Fresh Produce, Cut flowers, Technical, Quality, Data analysis, Statistics
References  
Prerequisite Skills Statistics, Numerical Analysis, Mathematical Analysis, Data Visualization, Knowledge of fresh produce will be considered an advantage
Other Skills Used in the Project Statistics, Numerical Analysis, Mathematical Analysis, Data Visualization
Programming Languages No Preference
Work Environment The student will be working as part of the Technical department, and will be supported by myself and another Postdoc.

 

Quantum computing internship

Project Title Quantum computing internship
Contact Name Ophelia Crawford
Contact Email ophelia.crawford@riverlane.com
Company/Lab/Department Riverlane
Address First Floor, St Andrew's House, 59 St Andrew's Street, Cambridge CB2 3BZ
Period of the Project 10-12 weeks between late June and late September
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 19th February 2021
Background Information Riverlane builds ground-breaking software to unleash the power of quantum computers. Backed by leading venture-capital funds and the University of Cambridge, we develop software that transforms quantum computers from experimental technology into commercial products.
Brief Description of the Project

What you will do:
- Develop an understanding of quantum algorithms and industrial applications of quantum computers
- Research, devise and develop algorithms and software to enhance Riverlane's capabilities
- Contribute to one or more projects that are core to Riverlane's scientific goals
- Discuss ideas with colleagues and communicate research in the form of presentations and reports

Requirements:
- A current undergraduate (at least third year), master's student or PhD student in a highly numerate subject, such as a science, mathematics, computer science, engineering, or a related technical field
- Experience with at least one programming language
- Excellent critical thinking and problem-solving ability
- Strong communication skills, both written and verbal
- Ability to take initiative and to work well as part of a team

Please visit our website (https://www.riverlane.com/vacancy/quantum-computing-summer-internship-sc...) for more information and to apply.

Keywords quantum, algorithms, software
References  
Prerequisite Skills  
Other Skills Used in the Project  
Programming Languages  
Work Environment

Our full-time summer internships are designed to enable current students in a quantitative field to translate their skills and expertise into an industrial setting. You will join us at our office in Cambridge, UK, for 10 to 12 weeks, where you will have the opportunity to work alongside our team of software developers, mathematicians, quantum information theorists, computational chemists and physicists - all experts in their fields. Every intern will have a dedicated supervisor and will work on a project designed to make the best use of their background and skills whilst developing their knowledge of quantum computing. We will support all interns to try and produce a concrete output by the end of the internship e.g. a paper, product, or software tool.

We will consider remote internships depending on the Covid-19 situation.

 

Pattern finding in industrial data

Project Title Pattern finding in industrial data
Contact Name Geoff Walker
Contact Email geoff.walker@faradaypredictive.com
Company/Lab/Department Faraday Predictive Ltd
Address St John's Innovation Centre, Cowley Road, Cambridge CB4 0WS
Period of the Project 8 weeks between late June and September
Project Open to Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information

Faraday Predictive is a Cambridge-based business, founded in 2017, with a world-leading technology for remote monitoring and diagnostics of industrial machinery that not only improves customers' business performance, it also contributes to reducing climate change impacts.

This technology is based on a whole suite of mathematical techniques, many of which have been developed in close collaboration with the Maths Department, directly involving students undertaking CMP summer projects. The students' contribution has had a significant impact on the success of the business. We continue to develop further improvements in the technology and its capability - and hope that this year's student(s) will see their projects have a similarly significant impact and benefit.

Brief Description of the Project

This project is all about identification of patterns in machine behaviour, to allow for identification of deviations from normal. The patterns of interest are in the form of spectral shapes, which our system creates for each machine every time it takes a reading - which can be several times per minute. So we have many many spectra as a basis from which to work - sometimes hundreds from one machine, which may or may not vary through time, and different spectra for each different machine that is monitored (and there are many machines).

Each machine has a "natural" spectral shape when it is in good condition. If a fault starts to develop in the machine, the spectral shape changes in some way, and we use this change through time to trigger warnings, and to diagnose the nature of the fault, allowing us to provide specific advice on recommended corrective action, and how soon it should be executed.

But if the first time we ever take a reading on a machine, there is already a fault present, we want to be able to identify this as a pre-existing fault, and not simply accept it as normal for this machine. At present we do this by comparing the shape for this particular machine against a spectrum for a "typical" machine - which is actually a simple averaged-out spectrum from a wide range of machines of different types.

Because the natural spectrum shape of each different type of machine is different, this "typical" spectrum is not a very good basis against which to decide whether the new machine that we are seeing for the first ever time has any faults or not.

Instead, we would like to create more specific "normal" spectra, for each type of machine, or group of types of machine, allowing us to select the one most appropriate to the machine in question.

So the tasks that we envisage being involved in this project are:
1. Create a simple algorithm for deciding closeness of fit of one spectrum shape to another.
2. Run this algorithm on the spectra in our databases to identify patterns of similar shapes - which we might call groups of machines
3. Cross refer these groups against known physical characteristics (eg machine type, duty, speed, size etc) to identify parameters that define what group a (new) machine would be expected to fall into
4. Create "normal" spectrum for each group (maybe just a simple average of them all; or we may want to take into account any known defects in some of the samples, so they are removed from the normalisation process)
5. (if time permits) - come up with a measure of "abnormality" - how far does any one spectrum deviate from the "normal" spectrum for the group in which it has been placed.

A successful outcome of this project will be that we end up with identification of patterns and sensible comparisons allowing us to provide more precise diagnostics, more precise indication of "normal" vs "abnormal", particularly for the first time we ever take a reading on a new machine, but also for application during subsequent changes through time.

Keywords Pattern recognition. Shape description. Data Grouping. Algorithm. Comparison.
References  
Prerequisite Skills Statistics, Mathematical physics, Mathematical Analysis, Geometry/Topology, Database Queries, App Building, Coding, eg Python (maybe). SQL queries (maybe - and we can teach)
Other Skills Used in the Project Predictive Modelling, Database Queries, App Building
Programming Languages Python, MATLAB, R, C++
Work Environment Remote working assumed. Project is basically one person, rather than a team, with a supervisor working as closely as is required. We expect frequent Zoom meetings to review, guide, assist. Data provision from our database - we'll arrange remote access. Normal office working hours assumed, but when working from home this is of course flexible.

 

Analytical solutions for use of varistors in superconducting magnet quench protection

Project Title Analytical solutions for use of varistors in superconducting magnet quench protection
Contact Name Dr Andrew Varney, Consultant Magnet Engineer
Contact Email andrew.varney@oxinst.com
Company/Lab/Department Oxford Instruments NanoScience
Address Tubney Woods, Abingdon, Oxon OX13 5QX
Period of the Project Up to 8 weeks between June and September
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information

Typical high field superconducting magnets can have a magnetic stored energy of around 10 MJ or more, which is 5 times the kinetic energy of a 4 tonne HGV travelling at 70mph. In the superconducting circuit, as there is no resistance, the current flows without energy loss. However, an extremely small disturbance, say 10 μJ, can lead to the superconducting magnet windings becoming resistive locally. This leads to a chain reaction, with the whole magnet becoming resistive and the stored energy dissipating as heat in the magnet windings over a few seconds' time scale. This is known as a magnet quench.

There are various schemes to protect a superconducting magnet against the effects of the rapid stored energy dissipation in a quench. The main goal is to prevent too much of the energy from being dumped locally which can produce a hotspot within the magnet windings. Typically, a protection circuit will consist of a potential divider network with resistors and diodes to manage currents and voltages within individual parts of the magnet. It will often also include secondary heaters to spread the quench across other parts of the magnet faster than the passive quench propagation would proceed.

Oxford Instruments has recently proposed a novel quench protection scheme involving varistors which is the subject of a patent application (not yet published). A varistor is an electrical device which exhibits a non-linear voltage vs current relationship. Specifically, at low voltage a varistor has a relatively high electrical resistance which decreases with increase voltage. Modelling of the quench behaviour and some experimental work has shown that the use of such components could be useful in a particular configuration to improve the quench protection for high field magnets.

Analytical equations can be derived based on the underlying physics using reasonable approximations. However, even for the simplest case of a homogenous coil divided into two sections and protected using conventional linear resistors, the result describing the propagation of a quench through the magnet coils is a second-order non-linear ODE for which only approximate solutions can be found with some further assumptions. It is not clear how to find solutions of the ODE representing the generalised case with variable resistance in the protection circuits.

Although numerical simulations for this system could be developed, analytic descriptions of how varistors would respond in a quench protection circuit will be invaluable in providing insight into the behaviour of the system over a wide range of parameters. This will also enable additional functionality in existing in-house software without requiring a great deal of computational resource.

Brief Description of the Project

The primary goal of the project is to find approximate analytical solutions to the equation representing the magnet quench propagation in the simplest case of the protection circuit subdividing the magnet into two sections, but generalised to allow for the use of varistors. The varistor behaviour may be represented as a simplified equation, but it may be possible to extend the treatment to allow for a more accurate representation. The use of numerical solutions to guide and test in the search for approximate analytical solutions would be appropriate, and such solutions would still be useful should analytical solutions prove to be too elusive.

This project would advance Oxford Instruments' understanding and modelling of varistors for use in protecting superconducting magnets. It is intended that it would thus support development of their practical use in the manner described in our patent application by helping in the selection of materials parameters required for a real magnet. The implementation at Oxford Instruments is likely to be in two ways: via an analytical tool to make initial estimates and by using the equations in our in-house quench modelling code.

If the project were particularly successful, an extension goal would be for the student to start working on these tools.

An academic outcome would be at least one published paper, possibly in a mathematical physics journal, but more likely a magnet/physics one.

Keywords Varistor Superconducting magnet Protection circuit Quench Analytical solution
References https://www.oxinst.com/news/a-new-era-in-high-field-superconducting-magn... Martin Wilson, Superconducting Magnets (OUP, 1983), especially chapter 9
Prerequisite Skills Mathematical physics, Numerical Analysis, Mathematical Analysis
Other Skills Used in the Project Simulation, Predictive Modelling
Programming Languages No Preference, FORTRAN would be ideal
Work Environment The student would be part of the R&D / technology development team, which consists mostly of doctoral-qualified physicists, for the duration of the project. An experienced mathematical physicist working in another group will also be available for consultation. The status of remote working depends on progress of the current pandemic, but there is likely to be at least an element of this. Ideally, the student would be able to work in the office/factory part of the time in order to meet people and to see the products to which the work relates. Oxford Instruments normal working hours are 37 hours per week (including early Friday finish).

 

Modelling and Numerical Simulation of Stress Dependent Oxidation of Silicon

Project Title Modelling and Numerical Simulation of Stress Dependent Oxidation of Silicon
Contact Name Vasily Suvorov
Contact Email vasily.suvorov@silvaco.com
Company/Lab/Department Silvaco Europe, Technology Computer-Aided Design (TCAD) Department
Address Compass Point, St Ives, Cambridgeshire, PE27 5JL
Period of the Project 8 weeks between July and September
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information The fabrication of integrated circuit microelectronic structures and devices vitally depends on the thermal oxidation process of silicon. The project aims to analyse the mathematical models of this process and construct effective numerical algorithms to explore the effects of various modelling assumptions. The successful outcome of the project will become a part of the company's commercial software.
Brief Description of the Project

Thermal oxidation of silicon is a way to produce a thin layer of oxide on the surface of a wafer in the fabrication of microelectronic structures and devices. The technique forces oxygen to diffuse into the silicon wafer at high temperature and react with it to form a layer of silicon dioxide: Si + O2 -> SiO2. The oxide layers are used for the formation of gate dielectrics and device isolation regions. With decreasing device dimensions, precise control of oxide thickness becomes increasingly important. In 1965 Bruce Deal and Andrew Grove proposed an analytical model that satisfactorily describes the growth of an oxide layer on the plane surface of a silicon wafer [1]. Despite the successes of the model, it does not explain the retarded oxidation rate of non-planar, curved silicon surfaces. The real cause for the observed retardation behaviour is believed to be the effect of viscous stress on the oxidation rate [2-3].

In this project, we aim to explore the existing mathematical models of the stress-dependent oxidation and propose a numerical scheme to obtain the solution. The approach that we will use is a combination of analytical and numerical analyses of a system of non-linear ordinary differential equations. The student is expected to implement the numerical algorithms in C++ language, although no previous experience in C++ coding is required. Silvaco's own software products may also be used as a tool in this project if required.

Keywords Oxidation, TCAD, Mathematical modelling, Numerical Algorithms, C++ coding
References [1] B.E.Deal, A.S.Grove (1965), General relationship for the thermal oxidation of silicon, Journal of Applied Physics, Vol.36, N12, 3770-3778.
[2] D.B.Kao, J.P.McVittie, W.D.Nix, K.C.Saraswat (1988), Two-dimensional thermal oxidation of Silicon - I. Experiment, IEEE Transactions on Electron Devices, Vol. ED-34, N 5, 1008-1017.
[3] D.B.Kao, J.P.McVittie, W.D.Nix, K.C.Saraswat (1988), Two-dimensional thermal oxidation of Silicon - II. Modeling stress Effects in Wet Oxides, IEEE Transactions on Electron Devices, Vol. ED-35, N 1, 1008-1017.
Prerequisite Skills Mathematical physics, Numerical Analysis, Mathematical Analysis, Simulation
Other Skills Used in the Project  
Programming Languages C++, None Required, Interest in C++ coding
Work Environment The student will work on his/her own with the support and guidance from the supervisor.

 

Deep representation learning for health records: identifying patients with similar interactions with health services

Project Title Deep representation learning for health records: identifying patients with similar interactions with health services
Contact Name Steve Kiddle
Contact Email steven.kiddle@astrazeneca.com
Company/Lab/Department AstraZeneca, Biopharmaceuticals R&D, Data Science and AI
Address Academy House, 136 Hills Road, Cambridge CB2 8PA
Period of the Project 8 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information Because people have been living healthier and longer lives, they are often living with more than one health condition, referred to in the scientific research setting as living with "multimorbidity." However, current NHS guidelines provided to doctors and nurses are organised around patients having only a single condition, ignoring the fact that many, especially the elderly, live with multimorbidity. It's important to better understand how to identify and group patients with multimorbidity in a meaningful way, so that doctors and nurses could provide the best possible personalised care.
Brief Description of the Project

The aim of the study is to use "deep learning" (a form of artificial intelligence) to determine whether patients that fall within a particular "multimorbidity" subgroup are in greater need of healthcare services in future (e.g., more frequent doctor visits, prescriptions, hospitalisations, etc). The MSc student will contribute to the creation of a "proof of concept" for the above study question that will be used to help inform future decision making and planning of next steps on the project.

The student would have an opportunity to:
- Learn about and apply deep learning and artificial intelligence
- Apply these techniques to a real-world database (e.g., MIMIC or CPRD)
- Interpret the outputs of the analysis in a meaningful way to support scientific decision-making at AstraZeneca

The student would split their time between the above project and working on other "live" projects running within the Data Science and AI team, providing students an opportunity to work on a wide variety of tasks that a data scientist typically face during a normal working day.

Keywords Multimorbidity, deep learning, neural networks, artificial intelligence, healthcare data, health data science
References Landi, I., Glicksberg, B. S., Lee, H. C., Cherng, S., Landi, G., Danieletto, M., Dudley, J. T., Furlanello, C., & Miotto, R. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 3, 96 (2020).
Prerequisite Skills Statistics, Mathematical Analysis, Algebra/Number Theory
Other Skills Used in the Project Image processing, Predictive Modelling, Database Queries
Programming Languages Python, R
Work Environment Virtual or face-to-face, depending on the Covid situation

 

Analytical Solution for Multi-Barrier Release, Mechanically Link Diffusion to In-vitro Release

Project Title Analytical Solution for Multi-Barrier Release, Mechanically Link Diffusion to In-vitro Release.
Contact Name Weimin Li
Contact Email weimin.li1@astrazeneca.com
Company/Lab/Department AstraZeneca
Address The Pavilion, Granta Park, Great Abington, Cambridge CB21 6GP
Period of the Project 8 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information Extended release of drug molecules from its carriers is one of the approches to improve patient compatability by for example reducing dose frequency. This project focus on one field that diffusion is believed to be the main mechanism for drug releasing. The increased complexity of formulation that multi-layers are designed, then it requires high levels of math ability to put together the differentials from Fick's law of diffusion.
Brief Description of the Project Week 1-2: Introduction of the background and read papers. Practice on writing simple and executable scripts.
Week 3-4: Work on the analytical solutions from Elliot J. Carr and Giuseppe Pontrelli that solves release from multi-layer spheres.
Week 5 -8: Bring empty spheres in to the calculation, and fit with existing data to estimate the diffusion coefficient and impact of the amount of empty sphere.
Keywords Fick's law of diffusion. Differential equations. Analytical and numerical solutions.
References  
Prerequisite Skills Mathematical physics, Numerical Analysis, Mathematical Analysis, Algebra/Number Theory
Other Skills Used in the Project Simulation, Predictive Modelling
Programming Languages Python, MATLAB, C++
Work Environment Mostly work from home

 

Multi-scale modeling to enable data-driven biomarker and target discovery

Project Title Multi-scale modeling to enable data-driven biomarker and target discovery
Contact Name Dr Shameer Khader
Contact Email shameer.khader@astrazeneca.com
Company/Lab/Department AstraZeneca, Data Science and Artificial Intelligence
Address Academy House, 136 Hills Road, Cambridge CB2 8PA
Period of the Project 8 weeks
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information Metagenomic sequencing of clinical samples has improved our understanding of how dysbiosis of microbial flora influences various human diseases. Emerging studies have shown that several microbial signatures were explicitly altered in the setting of immunological, cardiovascular, or gastrointestinal disorders, etc. Microbiome signatures, identified in the context of a disease and integrated with other types of molecular profiling data (genome, microbiome, transcriptome, metabolome, etc., collectively called multi-omics data), are gaining relevance in drug discovery. Such a data set offers opportunities to understand the specific functional pathways and metabolic reactions mediated by host-pathogen interactions in various diseases. Multi-omics is an emerging theme in drug discovery. It provides an unprecedented view into molecular players driving conditions and enables a path to discover new targets and therapies shortly.
Brief Description of the Project

AstraZeneca is investing in this exciting and vital area of drug development to generate unique multi-omics data sets to accelerate the development of novel therapies. Several projects are currently in progress to integrate microbiome with heterogeneous data sets (imaging, multi-omics, clinical, in-vivo disease models, etc.) using quantitative approaches. Collectively, such a system could lead to new targets and unique signatures correlated with human diseases. The collaborative study of altered microbial taxa/species and corresponding clinical phenotype by compiling a large and diverse data set will be an essential step toward understanding microbes' role in disease comorbidities. To achieve this goal, we collaborate with Microbial Sciences across a portfolio of projects that span multiple disease modalities.

The student will develop multi-scale models capable of integrating multi-omics data with clinical and imaging data using modern machine intelligence methods. The incoming candidate will be part of the Special Projects and Research Team. The team is currently working on a portfolio of projects with a common goal of accelerating drug or target discovery using machine intelligence methods. We aim to cross-train the incoming student in drug discovery, precision medicine, multi-scale biology, and data science. We expect the student to leverage high-performance computing and biomedical informatics facilities in AZ to develop data-driven methods to analyze large multi-scale, multi-omics data sets. The student will be part of collaborative efforts across microbial science, artificial intelligence, and drug development. This unique collaborative nature of the project will improve hands-on skills in clinical data, biomedical data analytics, and data science. The incoming student will contribute to the design, development, and deployment of predictive models that help organize, analyze and interpret. The student can also gain experience by working closely with the Microbial Sciences clinical development team.

Keywords Drug Discovery, Data Science, Machine Learning, Bioinformatics, Precision Medicine
References https://pubmed.ncbi.nlm.nih.gov/28892060/
https://pubmed.ncbi.nlm.nih.gov/31126891/
Prerequisite Skills Statistics, Probability/Markov Chains, Image processing, Predictive Modelling, Database Queries
Other Skills Used in the Project Probability/Markov Chains, Predictive Modelling, Data Visualization, App Building
Programming Languages Python, R, No Preference
Work Environment 9-5 at AZ campus or remote (depends on COVID restrictions)

 

TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials

Project Title TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials
Contact Name Dr Shameer Khader
Contact Email shameer.khader@astrazeneca.com
Company/Lab/Department AstraZeneca, Data Science & Artificial Intelligence, Special Projects & Research
Address Academy House, 136 Hills Road, Cambridge CB2 8PA
Period of the Project 8 weeks
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information One of the major impediments to successful drug development is the complexity, cost and scale of clinical trials, particularly large Phase III trials. Despite a wealth of historical data, clinical trial sponsors typically have a difficult time fully leveraging historical trial data to drive insight into optimal clinical trial design, reducing trial cost and scale. Many barriers exist to leveraging this data including drift in clinical terms and procedure over time, differences in trial structure and differences in data sampled. Recent advances in machine learning in areas such as Natural Language Processing (NLP) and graph modeling of complex data have enabled rapid advances in a number of domains. The TrialGraph project seeks to apply these methodologies to clinical trial data, creating a unified graph model to represent clinical trials across phases and therapeutic areas. Such a data modeling approach would enable novel and power analytics that enable efficiencies in drug development and benefit to our patients.
Brief Description of the Project

Multiple graph modeling initiatives are running in parallel and this project will leverage their infrastructure, graph modeling of external clinical and biomedical data as well as expertise. In collaboration with this wider community, the TrialGraph project will seek to leverage these resources while developing novel graph representations of historical AZ trials, methodologies to analyze these graph representations that provide meaningful insight and experiment with other machine learning methodologies that could yield both novel discoveries and operational efficiencies.

Expected Outcomes:
- Prototype graph data mode lapplied to multiple clinical trials
- Graph analytics aimed at providing insight into clinical trial operations and outcome
- Improve clinical trial enrollment lifecycle

Keywords Graph modeling, Data integration, Data Science, Clinical Trials, Machine Learning
References  
Prerequisite Skills Statistics, Probability/Markov Chains, Geometry/Topology, Predictive Modelling
Other Skills Used in the Project Database Queries, Data Visualization, App Building
Programming Languages Python, R
Work Environment AstraZeneca Campus/Remote (depending on COVID situation)

 

Network reconstruction from single cell transcriptomic data

Project Title Network reconstruction from single cell transcriptomic data
Contact Name Nil Turan
Contact Email nil.c.turan-jurdzinski@gsk.com
Company/Lab/Department GSK, Human Genetics Computational Biology
Address Gunnels Wood Road, Stevenage, SG1 2NY, United Kingdom
Period of the Project 8-10 weeks, flexible
Project Open to Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information Currently, molecular interaction networks in the field, and those used to support numerous target identification and validation efforts within GSK, present a generalized network comprising interactions that may not exist within a specific cell-type. The ability to reconstruct and analyse cell-specific molecular interaction networks has the potential to improve our cell-specific understanding of molecular processes and directly inform on relevant assays or mechanisms driving a disease. Recent advances in single cell RNA-seq technology allows the transcriptome of individual cells to be assessed [1]. This brings a great opportunity to reconstruct cell-specific molecular interaction networks. Several methods have been implemented to build such networks [2-3] but a systematic evaluation of such methods is yet to be conducted.
Brief Description of the Project The student will explore available methods to reconstruct networks from single cell RNA-seq data [2-3]. A background in statistics and mathematics is critical for reviewing these methods. They will then evaluate and test the performances of these different methods. Knowledge of single cell data is not required although some preliminary understanding will be useful. Preferred programming language would be R.
Keywords Network inference, single cell transcriptomics, computational biology, statistics, R
References [1] Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol (2019)15:e8746https://doi.org/10.15252/msb.20188746
[2] Simon Cabello-Aguilar, Mélissa Alame, Fabien Kon-Sun-Tack, Caroline Fau, Matthieu Lacroix, Jacques Colinge, SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics, Nucleic Acids Research, Volume 48, Issue 10, 04 June 2020, Page e55, https://doi.org/10.1093/nar/gkaa183
[3] Efremova, M., Vento-Tormo, M., Teichmann, S.A. et al. CellPhoneDB: inferring cell"“cell communication from combined expression of multi-subunit ligand"“receptor complexes. Nat Protoc 15, 1484"“1506 (2020). https://doi.org/10.1038/s41596-020-0292-x
Prerequisite Skills Statistics
Other Skills Used in the Project  
Programming Languages Python, R
Work Environment The student will work closely with Human Genetics Computational Biology, Functional Genomics Computational Biology and the stats group. The student will have the opportunity to interact and discuss with experts in single cell seq technology and also network approaches.

 

Algorithm development and modelling for security applications

Project Title Algorithm development and modelling for security applications.
Contact Name Sam Pollock
Contact Email careers@iconal.com
Company/Lab/Department Iconal Technology
Address St John's Innovation Centre, Cowley Road, Cambridge CB4 0WS
Period of the Project At least 8 weeks, June start
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest We will start interviewing as soon as we receive applications, but applications up to the end of March will be considered if we haven't already filled the position.
Background Information We are a Cambridge based consultancy carrying out research and development in new and emerging technologies for security, offering independent, impartial, science-based advice. This will be our fourth year offering CMP placements, and we are looking for keen, innovative, self-motivated individuals who are interested in the practical application of maths to solve real-world problems. You will be working in a small friendly (we like to think) team of scientists and engineers, and contributing directly to the output of current projects.
Brief Description of the Project Right now we do not know exactly what the student project will entail as we work in very rapidly evolving field. This years projects are likely to be focused around one or more of developing algorithms and machine learning solutions to analyse complex sensor data, building event-based simulations of security processes (including data collection and analysis from field observations) or helping with tests and trials of technology. Previous students have been exposed to all stages of the data pipeline / data science process. Our work is highly varied and interesting and you will likely get stuck in with all aspects of the job!
Keywords Security, machine learning, algorithms,
References http://www.iconal.com
Prerequisite Skills Statistics, Numerical Analysis, Image processing, Simulation, Data Visualization
Other Skills Used in the Project Predictive Modelling, App Building
Programming Languages Python, R, C++, Python preferred (as its our main one), but can consider other languages if relevant
Work Environment We are a small friendly team of 8 people, all working on a range of interesting diverse projects. The student will be based in our main office (or lab for data gathering) working on one or more projects with us, with a mentor on each project to help with queries, reviewing work and assigning tasks. This is of course subject to change should we still be under lockdown! We had a remote summer student in 2020, who worked virtually with the team.

 

Developing an approach for biotherapeutic purity quantitation from analytical instrument signals

Project Title Developing an approach for biotherapeutic purity quantitation from analytical instrument signals.
Contact Name David Hilton
Contact Email david.w.hilton@gsk.com
Company/Lab/Department GSK, Biopharm Process Research Group
Address Gunnels Wood Road, Stevenage, SG1 2NY
Period of the Project 8 weeks
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information The Biopharm Process Research group is the first step on the route from newly discovered biotherapeutic drugs to a commercial product which can be administered to a patient. It is the group's responsibility to screen candidate molecules for their developability, process fit and identify a suitable commercial cell line for their production. A key requirement of the group during process development and candidate molecule screening is to characterize the chemical, physical or biological attributes of the molecule to assess its purity. This is a critical attribute, as the purity of a biopharmaceutical product will influence both the efficacy and also safety of the drug.
Brief Description of the Project The analytical techniques used to characterize the purity of a biopharmaceutical drug, often output a signal that is a composite of peaks associated with the product of interest and product related purities along with signal noise, baseline deviations and instrument associated drift. A a part of GSK's standard biopharm drug development activities, thousands of these instrument signals are generated within the department each month, and the automated peak identification methods that are currently employed cannot adequately and consistently quantify drug purity. This oftentimes necessitates high levels of time-consuming manual data processing. The aim of this project is to develop an optimal procedure for peak identification and purity determination, using techniques ranging from simple deconvolution to CNN and LSTM machine learning methods, with model performance benchmarked against our large departmental datasets. Should a successful strategy be developed, this could be incorporated into a tool for deployment to our data processing pipelines, thereby enabling more rapid and robust development of GSK's biopharm drug portfolio.
Keywords Modelling, Visualization, Signals, Scripting, Pharmaceuticals
References  
Prerequisite Skills Statistics, Predictive Modelling, Data Visualization
Other Skills Used in the Project Database Queries
Programming Languages Python, R
Work Environment The student will be supervised during the project and, though working individually, will be involved in all departmental activities. Support from the Statistical Sciences group and Data Science teams will be available should this be required. Standard office hours will apply and remote working opportunities are available.

 

Is Quantum Machine Learning mature for clinical applications?

Project Title Is Quantum Machine Learning mature for clinical applications?
Contact Name Domingo Salazar
Contact Email domingo.salazar@astrazeneca.com
Company/Lab/Department AstraZeneca
Address City House, 130 Hills Road, Cambridge CB2 1RE
Period of the Project 8 weeks between late June and September
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information Quantum Computing (QC), in general, and Quantum Machine Learning (QML), in particular, have made considerable progress in the last few years. It is now possible to formulate typical clinical data science projects like uncovering associations of adverse effects with medicines or subgroup identification as QML problems. But what would be the benefits of doing this at this moment in time? Is QML ready to start to be used regularly in Pharma? And if so, for which kind of projects may it provide advantages over classical computing?
Brief Description of the Project

We would like to formulate an open-ended project made up of two parts:
* A literature review
* A practical example.

The literature review should provide a feeling for the state of the art in this area. In particular, it should point us towards what are the most promising current applications of QML to Pharma.

The practical example should be chosen based on the results of the literature review. It will be dimensioned according to the available time and QC resources available. Data sources may include publicly available clinical datasets, text, images and/or genomic sequences depending of the selected application.

There are a number of QC providers in the market place at the moment but for this purpose as well as for the literature review, it would be very interesting if we could set up a 3-way collaboration between the Cambridge Math Department, The QC group in Cambridge and AstraZeneca. This relationship could then be continue beyond this student project.

Keywords Quantum Computing, Quantum Machine Learning, Pharma, Clinical, AI
References * Quantum Machine Learning, Peter Wittek, Elsevier Insights (book)
* Amazon Braket (https://aws.amazon.com/blogs/aws/amazon-braket-get-started-with-quantum-...)
* Introduction to Quantum Computing with Python (https://pythonspot.com/an-introduction-to-building-quantum-computing-app...)
Prerequisite Skills Statistics, Simulation, Machine Learning
Other Skills Used in the Project Image processing, Predictive Modelling, Data Visualization
Programming Languages Python, R, Some of QC languages like Q#, if the corresponding Python packages proves to be too limited for our purposes.
Work Environment We like to integrate our students within our team so they experience what it means to do Data Science in a Pharma company. So the student will be able to talk to a number of data science specialists in our team as well as clinicians, biologist, bio-informaticians, image analysts, etc. as appropriate.

 

Aggregating embeddings in deep unsupervised graph learning

Project Title Aggregating embeddings in deep unsupervised graph learning
Contact Name Khan Baykaner
Contact Email khan.baykaner@astrazeneca.com
Company/Lab/Department Astrazeneca, Deep Learning, AI Engineering, R&D IT
Address Cambridge Road, Melbourn, Royston SG8 6EH
Period of the Project 8-12 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information The application of AI to digital pathology for drug development is a burgeoning field which promises to radically replace and enhance to the existing analysis workflows that lead to biological insight. One area of interest is the analysis of multiplex immunofluorescence (mIF) imaging for oncology; by using multiplexed tissue staining one can acquire a rich set of data for investigating the tumour microenvironment. However, efficient methods for analysing this rich data are still in their infancy. One method of investigation is to build a graph mapped to the cells within the tissue, and then use unsupervised learning techniques on the graph to capture the structure of the information in embeddings.
Brief Description of the Project This project will explore how elements of the unsupervised learning technique (e.g. such as the corruption function in deep graph infomax) affect the downstream performance of the trained embeddings, as well as techniques for aggregating embeddings in a spatially-aware manner. Depending on the area of focus, success would involve alterations to the mIF graph pipeline that allow embeddings to be combined across multiple samples in a consistent, spatially-aware manner without loss of relevant information. This in turn would be expected to dramatically improve the predictive power of downstream patient survival models.
Keywords graphs, unsupervised learning, deep learning, AI, pathology
References https://arxiv.org/pdf/1809.10341.pdf
Prerequisite Skills python, deep learning
Other Skills Used in the Project Data Visualization
Programming Languages Python
Work Environment Will collaborate with a small team of machine learning engineers. Whether work will be remote depends on the situation regarding the pandemic.

 

Predicting the pick-up weight of chocolate from real-time factory data

Project Title Predicting the pick-up weight of chocolate from real-time factory data
Contact Name Joe Donaldson
Contact Email Joe.Donaldson@unilever.com
Company/Lab/Department Unilever R&D
Address Colworth Science Park, Sharnbrook, Bedford MK44 1LQ
Period of the Project Flexible, minimum 8 weeks
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information Chocolate is an expensive ingredient that Unilever uses extensively within its ice cream portfolio in some of its most well-known brands like Magnum. To maintain the business viability of products, reduce waste, and maintain product quality and uniformity, the chocolate dosage, the so-called pick-up weight, needs to be well controlled. This parameter is a function not only of the properties of chocolate variant and batch itself, but also the conditions under which it is processed in the factory. Therefore, an accurate adjustment of these parameter during product assembly is key, and the ability to predict and proactively manage possible deviations would offer significant quality improvements and savings.
Brief Description of the Project This project will explore the feasibility of using sensor data to predict chocolate pick-up weight. The aim is to build upon our existing insights and harness the availability of this new data stream to construct a predictive hybrid model linking the data and the science of chocolate behaviour. Our end goal is a real time model suggesting simple adjustments to the operating parameters of the process line so factory operators can ensure the best possible chocolate-coated ice cream products make it into the hands of the consumer at a competitive price.
Keywords Ice Cream, Chocolate, Modelling, Machine-Learning, Python
References  
Prerequisite Skills Statistics, Predictive Modelling, Data Visualization
Other Skills Used in the Project Statistics, Predictive Modelling, Data Visualization
Programming Languages Python, MATLAB, R
Work Environment Independent working but with regular support from the wider science and technology team. The student will work remotely and be expected to share progress/results with supervisor(s) in daily/bi-weekly calls.

 

Early Stage Investing: Model Development for The Identification of Investable Technologies and Industries

Project Title Early Stage Investing: Model Development for The Identification of Investable Technologies and Industries
Contact Name Oliver Hedaux and Professor Richard Samworth
Contact Email oliver@ahren.co.uk and rjs57@hermes.cam.ac.uk
Company/Lab/Department Statslab, DPMMS and Ahren Innovation Capital
Address Statistical Laboratory, Centre for Mathematical Sciences, Wilberforce Road, Cambridge, CB3 0WB
Period of the Project 8-10 weeks, as agreed
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Friday 26th February 2021
Background Information

Ahren Innovation Capital is an investment fund with a remit to invest in and help build transformational companies at the intersection of deep technology and deep science that will have a positive impact on the world.

Ahren's broad fields of investment activity include: Brain & Artificial Intelligence; Genetics & Platform Technologies; Space & Robotics; And Planet & Efficient Energy. Whatever the domain, Ahren believes in taking asymmetric, considered risk that will deliver superior rewards -- capturing a generational opportunity to provide smart capital to deep technology.

Unlike in public markets where metrics of company health are established and quantified, the privately owned start-up companies that Ahren invests in have historically required a manual and qualitative approach to assessing company health and potential to create outsized returns.

This is largely a data problem. In public markets, where companies are legally required to publish their financial and operational results, the volume of data available for automated analysis is plentiful, consistent, and constrained to relatively few sources of truth in a structured format. On the other hand, private company data is rarely publicly disclosed, and the small sample of data that is shared is typically unstructured or semi structured and spread unevenly across many resources and data types (numerical and text). In some of the most exciting cases, where companies operate under the radar in "stealth mode", there is very little information at all.

As Ahren, we seek advantage in overcoming the historical constraints to quantitative early stage investing by designing novel, complementary systems to enhance our deep domain expertise in the areas that matter most.

Brief Description of the Project

Key drivers of Ahren's success is its ability to rapidly identify and assess world leading commercial technologies and gaps within industries that are ripe to have their biggest challenges addressed by innovation. Therefore, Ahren is starting with the automation of those tasks.

Project Goals:
1.To produce an automated Industry Model to address the following: "Help me to identify which sectors within an industry could produce attractive investment opportunities for Ahren"
2.To produce an automated Technology Model to address the following: "Help me to find the top five academic groups, globally, for any given research area."

For each model, Ahren has set out a non-exhaustive list of high-level questions to be assessed using the cross-domain expertise of Cambridge's Statistical Laboratory and Ahren Innovation Capital:

Industry
- How healthy is the industry?
- Are there favorable industry dynamics [automated Porter Analysis]?
- Can the industry be disrupted?
- Which sub sectors are receiving greatest capital allocation and seeing the most exits?
- How big could this become?

Technology
- Which are the top academic groups in this domain?
- Did this technology originate in a top academic group?

This project will require originality and creativity, bringing to bear the potential of mathematics, statistics, and machine learning to collate and derive insight from semi-structured and unstructured data. It is essential that a quality data set is built. The data set should be kept up to date and relevant using application programming interfaces and web-scraping techniques.

There are many sources of data and a good project will use a range of sources. A non-exhaustive list of possible sources is below:
- Relevant industry resources (CB Insights, Pitchbook, Beauhurst)
- Employee type / growth (LinkedIn)
- Patent applications (Escape net) or licenses
- Github / (specialized blog) activity
- Founder / individual blog posts on topic within sector
- Company / Founder social media posts and following
- Publications / conference posters: Quantity (PubMed or equivalent) and Quality (journal impact rating, paper awards)
- News articles (e.g., using news scraping tool such as Factiva)

This interdisciplinary project would ideally be completed by one Mathematician and one Computer Scientist.

Keywords Statistics, ML, Unstructured, Semi-Structured, Investing
References CB Insights Mosaic Score: https://www.cbinsights.com/company-mosaic
Prerequisite Skills Statistics, Predictive Modelling, Database Queries, Machine Learning
Other Skills Used in the Project Data Visualization, App Building
Programming Languages No Preference
Work Environment Remote placement. Students, in this case two, will have regular (twice weekly) check-ins with qualified members of the Ahren Innovation Capital Investment Team. Meetings with Senior team members will be held biweekly. There will be opportunity to schedule additional meetings as the project demands. Students will work normal office hours, five days per week.

 

Modelling optionality in inflation linked securities

Project Title Modelling optionality in inflation linked securities
Contact Name Richard Manthorpe
Contact Email cambridge.recruitment@symmetryinvestments.com
Company/Lab/Department Symmetry Investments, Quantitative Analytics
Address 86 Jermyn Street, Fourth Floor, London SW1Y 6JD
Period of the Project 8-12 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest  
Background Information We are looking for an intern to work in the Quantitative Analytics group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations. This project focuses on modelling the embedded optionality in certain inflation linked securities, such as BTP Italias, an Italian inflation linked security containing a high watermark indexation feature.
Brief Description of the Project The project would be of an interest to a student considering pursuing a career in investment management. It consists of several steps allowing an intern to get exposure to all aspects of the development of an investment strategy. First the candidate would be introduced to the mathematics that govern bond and option pricing for both nominal and inflation linked securities, reviewing the relevant literature. Secondly the candidate will work closely with both the trading and quant teams to develop a model and the necessary analytics to evaluate these securities and gain an understanding as to how the traders assess them. An optional third stage of the project is to extend the analytics to handle other securities, such as UK LPI derivatives. We will be looking for a presentation of results and conclusions towards the end of the project. The project will be pursued with close cooperation of a portfolio management team. During the internship, the student will have an opportunity to learn about practical aspects of investments and risk taking from portfolio managers.
Keywords inflation, derivatives, options, bonds
References  
Prerequisite Skills  
Other Skills Used in the Project Statistics, Probability/Markov Chains, PDE's, Mathematical Analysis, Data Visualization
Programming Languages No Preference
Work Environment The student will work in the analytics team. There will be opportunities to talk about the project across several other teams.

 

Modelling inflation expectations in financial markets (project withdrawn)

Project Title Modelling inflation expectations in financial markets
Contact Name Andrey Pogudin
Contact Email cambridge.recruitment@symmetryinvestments.com
Company/Lab/Department Symmetry Investments, Quantitative Research
Address 86 Jermyn Street, Fourth Floor, London SW1Y 6JD
Period of the Project 8-12 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest  
Background Information We are looking for an intern to work in the Quantitative Research group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations. The project focuses on the analysis and modelling of inflation expectations in the global economy in order to identify investment opportunities in financial markets. Inflation expectations is a key variable driving many financial asset prices.
Brief Description of the Project We are looking for an intern to work in the Quantitative Research group at Symmetry Investments, an investment management company. The project focuses on the analysis and modelling of inflation expectations in the global economy in order to identify investment opportunities in financial markets. Inflation expectations is a key variable driving asset prices both in the short and medium term. The project would be of an interest to a student considering pursuing a career in investment management. We expect the project to take place over at least 8-10 weeks in the summer 2021. The project consists of several steps allowing an intern to get exposure to all aspects of the development of an investment strategy. First, we would start with reviewing recent literature on inflation and inflation expectations modelling both by academia and market practitioners. Second, building on this review, we will construct some simple toy models starting, perhaps, with linear modelling frameworks (regression based models) and then proceeding to more sophisticated econometric approaches, including machine learning algorithms. The last step of the project is application of algorithms to actual financial datasets. The project will be pursued with the cooperation of a portfolio management team. During the internship, you will have an opportunity to learn about practical aspects of investments and risk taking from portfolio managers.
Keywords inflation, modelling, machine learning
References  
Prerequisite Skills  
Other Skills Used in the Project Statistics, Data Visualization, App Building, Econometrics
Programming Languages No Preference
Work Environment The student will work in the quantitative research team.

 

State of the art in Covariance matrix estimation and filtering for Risk assessment

Project Title State of the art in Covariance matrix estimation and filtering for Risk assessment
Contact Name Fabien Micallef
Contact Email cambridge.recruitment@symmetryinvestments.com
Company/Lab/Department Symmetry Investments, Quantitative Analytics
Address 86 Jermyn Street, Fourth Floor, London SW1Y 6JD
Period of the Project 8-12 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest  
Background Information We are looking for an intern to work in the Quantitative Analytics group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations. This project focuses on reviewing the techniques for Covariance estimation of the returns of assets and how to update it throughout time.
Brief Description of the Project

The project would be of an interest to a student considering pursuing a career in investment management. Covariance matrix is a central tool in the estimation of the risk and so its estimation and filtering is very important. We would like to review the different techniques and their robustness in regards to the dimensionality, sample size. The goal is to separate the noise from real signal and filter/interpolate/update it in time.

Examples of techniques to review could be Random Matrix theory, different techniques to average covariance matrices. Naive arithmetic mean of Symmetric Positive Definite matrix conduct to swelling effect. Geometric mean on SPD matrix is one technique to cope with it.

Considering the vast array of techniques, the student will have to be critical about the benefits of one technique over the other. We will first test the techniques on synthetic generated and plotted data in Python and then test them on real world one.

Keywords covariance estimation, risk, algorithms
References  
Prerequisite Skills  
Other Skills Used in the Project Statistics, Random Matrix Theory, Differential Geometry, Lie Groups
Programming Languages No Preference
Work Environment The student will work in the analytics team.

 

Fuzzy matching algorithm for live trade populations

Project Title Fuzzy matching algorithm for live trade populations
Contact Name Pierre Micottis
Contact Email cambridge.recruitment@symmetryinvestments.com
Company/Lab/Department Symmetry Investments, Quantitative Analytics
Address 86 Jermyn Street, Fourth Floor, London SW1Y 6JD
Period of the Project 8-12 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest  
Background Information We are looking for an intern to work in the Quantitative Analytics group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations.
Brief Description of the Project Like many other firms, Symmetry had a process by which the collateral that it holds or posts to its market counterparties is updated. This process has numerous steps but the one of interest here is when Symmetry wants or needs to reconcile its calculations with those performed by a given counterparty. In order to do so, parties exchange trade-level information. As time is of the essence, matching the trade populations produced by Symmetry and its counterparties needs to be as automated and robust as possible. It happens very often that trade details are close but not exact between counterparties so the goal of this project is to explore and implement algorithms which will look for what might be imperfect matches based on trade details but perfect matches for the trade itself. This will enable the people involved in this step of the process to focus on exceptions and errors and free valuable time up.
Keywords algorithms, fuzzy matching, trade reconciliation
References  
Prerequisite Skills  
Other Skills Used in the Project Programming, process design, machine learning
Programming Languages No Preference
Work Environment The student will work in a team. There will be opportunities to talk about the project across several other teams.

 

Solvers for Integer Quadratic Program ("IQP") problems related to allocating trades

Project Title Solvers for Integer Quadratic Program ("IQP") problems related to allocating trades
Contact Name Pierre Micottis
Contact Email cambridge.recruitment@symmetryinvestments.com
Company/Lab/Department Symmetry Investments, Quantitative Analytics
Address 86 Jermyn Street, Fourth Floor, London SW1Y 6JD
Period of the Project 8-12 weeks
Project Open to Master's (Part III) students
Initial Deadline to register interest  
Background Information We are looking for an intern to work in the Quantitative Analytics group at Symmetry Investments is a post-startup US $7billion alternative asset management company with around 220 people across multiple time zones and locations.
Brief Description of the Project The main objective consists in determining how the total quantity of a partially or fully-executed order should be allocated to a number of accounts of funds. It is typically preferable to do a single trade in the market and then allocate it, subject to a series of constraints. This type of problem has to be solved in such a way that each allocated trade "stand on its own", meaning that it could have been executed as such and satisfy constraints like minimum tradeable size, minimum position size, strategy-level implied ratios as close as possible to target ratios and so forth. So the solutions are typically expressed as a list of integers which minimize some objective function under constraint. Optimisations have to be done both with respect to the quantities allocated but also the Volume Weighted Average Price (or "VWAP").
Keywords integer quadratic programming, algorithms, trade allocation
References  
Prerequisite Skills  
Other Skills Used in the Project Programming, solvers, algorithms
Programming Languages No Preference
Work Environment The student will work in a team. There will be opportunities to talk about the project across several other teams.

 

Neural Network Model Calibration 

Project Title Neural Network Model Calibration
Contact Name Nicolas Leprovost
Contact Email nicolas.leprovost@bp.com
Company/Lab/Department BP, Quantitative Analytics
Address 20 Canada Square, London E14 5NJ
Period of the Project 2 to 6 months starting in summer 2021
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Wednesday 31st March 2021
Background Information To assist BP's ambition in the renewable space, it is essential to be able to model the joint evolution of power prices and renewables production. For example, modelling jointly the wind output (or equivalently the wind speed) and the electricity price is necessary to assess the cost of developing a wind farm project. This problem can be addressed by using Monte-Carlo simulations. In order to properly represent the dynamics of the underlying one needs to have a robust calibration mechanism that mimics its statistical properties.
Brief Description of the Project

Recent development in Machine Learning showed that Deep Learning methods could be applied efficiently to calibrate an option pricing model. During that internship, we will focus on two approaches, namely the historical calibration [1] where model parameters are estimated from historical market data and the volatility surface calibration [2] where parameters are obtained by inverting the market implied volatility surface. Those two problems will involve latest development in machine learning area such as the use of the signature function [3] or the Swish activation function [4].

To apply: https://jobs.brassring.com/1033/ASP/TG/cim_jobdetail.asp?partnerid=25078...

Keywords financial engineering, machine learning
References [1] Stone H. Calibrating rough volatility models: a convolutional neural network approach. Quantitative Finance, 20(3):379–392, 2020
[2] Bayer C, Horvath B, Muguruza A, Stemper B, Tomas M. On deep calibration of (rough) stochastic volatility models. arXiv preprint arXiv:1908.08806, 2019.
[3] Chevyrev I, Kormilitzin A. A primer on the signature method in machine learning. arXiv preprint arXiv:1603.03788, 2016
[4] Ramachandran P, Zoph B, Le QV. Swish: A self-gated activation function. arXiv 2017. arXiv preprint arXiv:1710.05941.
Prerequisite Skills Statistics, Probability/Markov Chains
Other Skills Used in the Project Simulation
Programming Languages Python
Work Environment Depending on regulations at the time, we hope you will be able to work in the office. You will be assigned a project supervisor and will take part in weekly team meetings.

 

Segmenting duodenal biopsy images

Project Title Segmenting duodenal biopsy images
Contact Name Julian Gilbey
Contact Email jdg18@cam.ac.uk
Company/Lab/Department Lyzeum Ltd. / DAMTP
Address jdg18@cam.ac.uk
Period of the Project 8 weeks between late June and September
Project Open to Undergraduates, Master's (Part III) students
Initial Deadline to register interest Monday 29th March 2021
Background Information Coeliac disease is an autoimmune condition triggered by exposure to gluten (in wheat and other grains), and it can cause significant long-term harm if left untreated. Treatment is a lifelong gluten-free diet. This condition is estimated to affect about 1% of the UK population, but is very under-diagnosed; probably only 1 in 5 or 1 in 6 sufferers is aware that they have it. The gold standard for diagnosis is to perform a biopsy and to look for signs of the disease process on the tissue. This requires highly-trained pathologists to look at each biopsy and to assess it for disease. There is a shortage of pathologists in the UK, and there is often disagreement between pathologists on the diagnosis of individual tissue samples. The long-term aim of our work is to develop a method for obtaining a diagnosis from a tissue sample in an automated fashion, either to guide pathologists in their work or to save the need for a pathologist to look at every sample.
Brief Description of the Project One of the challenging parts of this work is dealing with very large and varied microscope images and identifying the different small-scale and large-scale structures present. Some techniques have already been developed for this, but they are usually effective for only one scale. In our case, we need to use some large-scale information to inform the small-scale identification, and possibly vice-versa. The purpose of this summer project is to explore some of the existing state-of-the-art techniques and to see how they can be combined, adapted and/or developed for our needs. A successful outcome would be a tool for performing this identification. (Note that in the literature, this process is called "segmentation".)
Keywords Deep learning, neural networks, image analysis, digital pathology, coeliac disease
References - An introductory seminar on this work is available at: https://cambridgelectures.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=... and a related seminar on the biology of coeliac disease and a bioinformatics approach is here: https://cambridgelectures.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=... (These can be accessed from the cam.ac.uk domain or using a Raven account.)
- For learning PyTorch, the fastai course (https://github.com/fastai/fastbook) is very helpful.
- There are also many papers available on digital histopathology that are potentially relevant, and the Coeliac UK website gives more information about the condition.
Prerequisite Skills Image processing, neural networks and deep learning; any other mathematical skills are also potentially useful.
Other Skills Used in the Project  
Programming Languages Python, We are using PyTorch in our work; this can be learnt during the course of the project.
Work Environment We are currently a small team (of 2 plus an MPhil student!) all working from home, and meet very regularly over Discord or Zoom. If the COVID-19 situation allows it, we might be able to meet in person in Cambridge or London on occasion as well, but there are no specified working hours or location for working. Note that you must have the right to work in the UK to be eligible for this project; you do not have to be currently based in the UK, though.