skip to content

Summer Research Programmes

 

This is a list of the Industrial CMP project proposals from summer 2022:

TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials 

Project Title TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials
Keywords deep learning, graph theory, graph ML, machine learning
Contact Name Dr. Shameer Khader
Contact Email shameer.khader@astrazeneca.com
Company/Lab/Department Data Science & Artificial Intelligence, AstraZeneca
Address Remote or On-Campus (AstraZeneca PLC 1 Francis Crick Avenue Cambridge Biomedical Campus Cambridge CB2 0AA)
Period of the Project 8-12 weeks
Work Environment Remote or on-campus; the student will be embedded within a team of 10+ members
Project Open to Undergraduates; Master's (Part III) students
Background Information One of the major impediments to successful drug development is the complexity, cost and scale of clinical trials, particularly large Phase III trials. Despite a wealth of historical data, clinical trial sponsors typically have a difficult time fully leveraging historical trial data to drive insight into optimal clinical trial design, reducing trial cost and scale. Many barriers exist to leveraging this data including drift in clinical terms and procedure over time, differences in trial structure and differences in data sampled. Recent advances in machine learning in areas such as Natural Language Processing (NLP) and graph modeling of complex data have enabled rapid advances in a number of domains. The TrialGraph project seeks to apply these methodologies to clinical trial data, creating a unified graph model to represent clinical trials across phases and therapeutic areas. Such a data modeling approach would enable novel and power analytics that enable efficiencies in drug development and benefit to our patients. Multiple graph modeling initiatives are running in parallel and this project will leverage their infrastructure, graph modeling of external clinical and biomedical data as well as expertise. In collaboration with this wider community, the TrialGraph project will seek to leverage these resources while developing novel graph representations of historical AZ trials, methodologies to analyze these graph representations that provide meaningful insight and experiment with other machine learning methodologies that could yield both novel discoveries and operational efficiencies.
Brief Description of the Project
  • Prototype graph data mode applied to multiple clinical trials
  • Graph analytics aimed at providing insight into clinical trial operations and outcome
  • Improve clinical trial enrollment lifecycle
  • Develop methods to model clinical trials using a graph structure, apply and test these methods using data from multiple clinical trials including different phases of study and therapeutic area
  • Develop machine-learning enabled graph analytics to provide insight into clinical trial outcome, clinical trial operations and overall clinical trial design
  • Prototype machine learning autoencoder based methods for working with clinical trial data
References TrialGraph v1 Manuscript: https://arxiv.org/abs/2112.08211
Prerequisite Skills Geometry/Topology;Predictive Modelling;Data Visualization
Other Skills Used in the Project Statistics;Probability/Markov Chains
Programming Languages Python;R;No Preference;GraphML tools and libraries, SQL

 

AI-Driven Therapeutic Responder/Non-Responder Prediction using Multi-modality Signature (TheraSign Project) 

Project Title AI-Driven Therapeutic Responder/Non-Responder Prediction using Multi-modality Signature (TheraSign Project)
Keywords precision medicine, personalized medicine, companion diagnostics, applied AI/ML, data science
Contact Name Dr. Shameer Khader
Contact Email shameer.khader@astrazeneca.com
Company/Lab/Department AstraZeneca / Machine Learning Research / Data Science & Artificial Intelligence
Address shameer.khader@astrazeneca.com
Period of the Project 8-12 weeks
Work Environment Student will be embedded in a team with 10+ members
Project Open to Undergraduates;Master's (Part III) students
Background Information Precise understanding of responders and non-responder populations from real-world data remains a critical aspect of drug development. In the current era of value-based contracting, understanding precise sub-population within a disease stratum remains a high-value research question. In this project, the incoming graduate student will expand our internally developed, AI-driven therapeutic responder/non-responder discovery platform. The platform - TheraSign, designed to capture the heterogeneous signature driving therapeutic responses, includes three key modules: digital phenotyping, responder/non-responder definitions and predictive modelling. The incoming student will help improve a module of choice or help to expand the application of TheraSign to apply to AZ assets. As a part of the precision medicine program, the development of TheraSign and its application is an important milestone and the incoming student will be trained to use and apply modern data integration and machine learning approaches.
Brief Description of the Project
  • Student will gain experience in biomedical and healthcare data mining and variety of skills to develop analytics strategy that leverage biomedical data
  • The incoming student will get a co-mentorship from different departments of AstraZeneca, including therapeutic areas, and Data Science & AI.
  • The student will be part of collaborative projects and publications that will improve hands- on skills in data science, machine learning, digital biology and bioinformatics
  • Develop algorithms to improve therapeutic responder/non-responder analytics
  • Understand key drivers or responders/non-responders for different AZ medicines
  • Improve the performance of TheraSign modules
References
  • Machine learning in cardiovascular medicine: are we there yet? https://heart.bmj.com/content/104/14/1156.
  • Long Mammography Assessment using Multi-Scale Deep Classifiers https://arxiv.org/abs/1807.03095
  • Functional genomics predictive network model identifies regulators of inflammatory bowel disease. Nat Genet. 2017 Oct;49(10):1437-1449. doi: 10.1038/ng.3947. Epub 2017 Sep 11. PubMed PMID: 28892060; PubMed Central PMCID: PMC5660607.
  • Glicksberg BS, et. al; Comparative analyses of population-scale phenomic data in electronic medical records reveal race-specific disease networks. Bioinformatics. 2016 Jun 15;32(12):i101-i110. doi: 10.1093/bioinformatics/btw282. PubMed PMID: 27307606; PubMed Central PMCID: PMC4908366.
  • Shameer K et. al; Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams. Brief Bioinform. 2017 Jan;18(1):105-124. doi: 10.1093/bib/bbv118. Epub 2016 Feb 14. PubMed PMID: 26876889; PubMed Central PMCID: PMC5221424.
Prerequisite Skills Statistics;Probability/Markov Chains;Image processing;Geometry/Topology;Predictive Modelling;Database Queries;Data Visualization;App Building
Other Skills Used in the Project Fluids;Simulation
Programming Languages Python;R;No Preference;SQL
Work Environment Student will be embedded in a team with 10+ members

 

Precision sub-typing of patient populations using "Super Progressor Phenotypes" 

Project Title Precision sub-typing of patient populations using "Super Progressor Phenotypes"
Keywords precision medicine, predictive modeling, personalized medicine, biomedical AI, healthcare data science
Contact Name Dr. Shameer Khader
Contact Email shameer.khader@astrazeneca.com
Company/Lab/Department AstraZeneca / Machine Learning Research/ Data Science & Artificial Intelligence
Address Remote or onsite in one of the UK campus
Period of the Project 8-12 weeks
Work Environment Student will be part of a 10+ member team
Project Open to Undergraduates; Master's (Part III) students
Background Information Understanding the disease progression, patient-specific clinical trajectories, and factors associated with how patients progress through disease course is an emerging interest to develop precise and effective therapies. Recently, we have combined real-world data and machine learning approaches to develop an algorithm capable of identifying Super Progressors in Non-Alcoholic Steato Hepatitis (NASH). NASH is a poorly characterized disease with a global epidemiologic footprint associated with significant morbidity and mortality rates. Identifying a patient subset that shows a rapid rate of disease acceleration with a more severe phenotype as 'NASH super progressors' is a critical need. This population is of particular interest for AZ Clinical Trials as characterized subpopulation(s) of super progressors will likely have altered Benefit-Risk profiles and could help define novel endpoints. In this project, we propose to explore clinical and real-world evidence datasets, containing insurance claims data, lab values and medical diagnoses and procedures, focused on different disease of interest (type-2 diabetes, obesity, and chronic kidney disease) (GEO, VIVLI, IBM MarketScan, Optum) to develop machine learning models to develop a methodology to characterize subpopulations of NASH patients. There are also opportunities to collaborate with external academic partners on EHR data and joint research with leading clinical research centres in the UK and the US. This work will contribute to an understanding of the factors driving progressors and has the potential to benefit AstraZeneca trials in trial design and patient stratification.
Brief Description of the Project

AIMS AND EXPECTATIONS

  • Develop methods to model patient sub-typing
  • Prototype machine learning methods for working with public data
  • The incoming student will gain experience in predictive modelling, healthcare analytics, data science, integration and mining large-scale, real-world clinical data from public and proprietary database
  • The incoming student will work closely with an AstraZeneca team to develop a novel phenotype or endpoint related to NASH or other diseases
  • The student will gain experience in compiling data, building models, and contributing to publishing the model in leading conferences/journals
References
  • Shameer K, et. al; PREDICTIVE MODELING OF HOSPITAL READMISSION RATES USING ELECTRONIC MEDICAL RECORD-WIDE MACHINE LEARNING: A CASE-STUDY USING MOUNT SINAI HEART FAILURE COHORT. Pac Symp Biocomput. 2017;22:276-287. doi: 10.1142/9789813207813_0027. PubMed PMID: 27896982; PubMed Central PMCID: PMC5362124.
  • Shameer K, et. al; Machine learning in cardiovascular medicine: are we there yet? Heart. 2018 Jan 19. pii: heartjnl-2017-311198. doi: 10.1136/heartjnl-2017-311198. [Epub ahead of print] Review. PubMed PMID: 29352006.
  • Badgeley MA, et. al; EHDViz: clinical dashboard development using open-source technologies. BMJ Open. 2016 Mar 24;6(3):e010579. doi: 10.1136/bmjopen-2015-010579. PubMed PMID: 27013597; PubMed Central PMCID: PMC4809078.
Prerequisite Skills Statistics;Probability/Markov Chains;Predictive Modelling;Database Queries;Data Visualization;App Building
Other Skills Used in the Project  
Programming Languages Python;R;No Preference;SQL

 

Drug Target Discovery using Multi-omics Signatures 

Project Title Drug Target Discovery using Multi-omics Signatures
Keywords Bioinformatics, drug discovery, multi-omics, genomics, proteomics
Contact Name Dr. Shameer Khader
Contact Email shameer.khader@astrazeneca.com
Company/Lab/Department AstraZeneca/Machine Learning Research/Data Science & Artificial Intelligence
Address AstraZeneca PLC 1 Francis Crick Avenue Cambridge Biomedical Campus Cambridge CB2 0AA
Period of the Project 8-12 weeks
Work Environment Incoming student will be part of a team with 10+ members
Project Open to Undergraduates;Master's (Part III) students
Background Information Metagenomic sequencing of clinical samples has improved our understanding of how dysbiosis of microbial flora influences various human diseases. Emerging studies have shown that several microbial signatures were explicitly altered in the setting of immunological, cardiovascular, or gastrointestinal disorders, etc. Microbiome signatures, identified in the context of clinical phenotypes, offer unique challenges to understanding the specific functional pathways and metabolic reactions mediated by host-pathogen interactions. AstraZeneca is investing in this exciting and vital area to generate unique data sets and to interpret complex data to develop novel therapies. Currently, several projects are in progress to integrate microbiome with heterogeneous data sets (imaging, multi-omics, clinical, in-vivo disease models, etc.) using bioinformatics and data science approaches. We are also developing novel tools and translational bioinformatics workflows to accelerate multi-omics- driven discovery. Collectively, such an approach could lead to new targets and unique precision medicine approaches. The collective study of altered microbial taxa/species and corresponding clinical phenotype by compiling a large and diverse data set will be an essential step toward understanding the role of microbes in disease comorbidities. To achieve this goal, we are collaborating with Microbial Sciences across a portfolio of projects that span multiple disease modalities. The incoming student will develop multi-scale models capable of integrating multi-omics data with clinical and imaging data using modern machine intelligence methods. Currently, we are developing multiple translational bioinformatics resources, the incoming student could use these tools or other leading translational bioinformatics tools and analyze data pertaining to one of the focus diseases: NASH, COPD, Parkinson's Disease, IBD etc.
Brief Description of the Project
  • The incoming candidate will be part of the Machine Learning Research Team within DS & AI's Applied Analytics and Artificial Intelligence Organization. The team is currently working on a portfolio of projects across multiple therapeutic areas with a common goal of optimizing clinical trials using machine intelligence methods.
  • We aim to cross-train the incoming student in the areas of digital biology, drug discovery, precision medicine, multi-scale, and data science. We expect the student to leverage high- performance computing and biomedical informatics facilities in AZ to assist in develop data- driven methods to analyze large multi-scale, multi-omics data sets.
  • The student will be part of collaborative efforts across microbial science, artificial intelligence, and drug development. This unique collaborative nature of the project will help to improve hands-on skills in clinical data, biomedical data analytics, and data science.
  • The incoming student will contribute to the design, development, and deployment of predictive models that help to organize, analyze and interpret large-scale clinical and omics
References
Prerequisite Skills Statistics;Probability/Markov Chains;Image processing;Geometry/Topology;Predictive Modelling;Database Queries;Data Visualization;App Building
Other Skills Used in the Project  
Programming Languages Python;R;No Preference;SQL

 

Digital Drug (Re)positioning: Data-driven Indication Discovery for New Drugs 

Project Title Digital Drug (Re)positioning: Data-driven Indication Discovery for New Drugs
Keywords drug development, drug discovery, drug repositioning, data science, machine learning, deep learning
Contact Name Shameer Khader
Contact Email shameer.khader@astrazeneca.com
Company/Lab/Department AstraZeneca / Machine Learning Research / Data Science & Artificial Intelligence
Address shameer.khader@astrazeneca.com
Period of the Project 8-12 weeks
Work Environment Student will be part of a team with 10+ members
Project Open to Undergraduates; Master's (Part III) students
Background Information Drug repositioning is defined as a systematic or targeted evaluation of pharmaceuticals to identify new use for existing drugs. AstraZeneca is interested in Drug Positioning/Repositioning technologies for many years, and several of our drugs have a high potential for repositioning. Initially considered to be a niche area, over the years, many Biopharma companies are ramping up their presence in this sector as several COVID-19 related therapies were identified using drug repositioning approaches. With the aid of innovative emerging digital technologies, including data science and artificial intelligence, we are interested in digital drug positioning. We take drugs in various stages of development and identify new indications. Such an approach is a powerful drug development strategy that would help to delineate complex associations between diseases and drugs that mediate biological functions, including pleiotropy. We are building a suite of centralized tools, databases, and methods to augment drug repositioning efforts. Prediction results from computational approaches will be used for downstream experimental validation and function test experiments that would lead to investment decision to launch the new clinical program. Collectively, we are leveraging recent advances in machine intelligence, including deep learning, to develop new ways to enhance drug repositioning investigations. We are also developing systematic drug repositioning and positioning using internal and external data assets.
Brief Description of the Project
  • Student will gain experience in digital drug positioning, a new drug development strategy that leverage biomedical data
  • The incoming student will get a co-mentorship from different departments of AstraZeneca, including therapeutic areas, Emerging Innovations Unit, and Data Science & AI.
  • The student will be part of collaborative projects and publications that will improve hands-on skills in data science, machine learning, digital biology and bioinformatics
  • Develop methods to model drug repositioning
  • Develop machine-learning enabled drug repositioning methods
  • Prototype machine learning methods for working with public data
References
  • Shameer K, et. al; Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning. Brief Bioinform. 2017 Feb 15. doi: 10.1093/bib/bbw136. [Epub ahead of print] PubMed PMID: 28200013.
  • Yadav KK et. al; Systems Medicine Approaches to Improving Understanding, Treatment, and Clinical Management of Neuroendocrine Prostate Cancer. Curr Pharm Des. 2016;22(34):5234-5248. Review. PubMed PMID: 27174811.
  • Hodos RA, et. al; In silico methods for drug repurposing and pharmacology. Wiley Interdiscip Rev Syst Biol Med. 2016 May;8(3):186-210. doi: 10.1002/wsbm.1337. Epub 2016 Apr 15. Review. PubMed PMID: 27080087; PubMed Central PMCID: PMC4845762.
Prerequisite Skills Statistics;Probability/Markov Chains;Predictive Modelling;Database Queries;Data Visualization
Other Skills Used in the Project  
Programming Languages Python;R;No Preference;SQL
Work Environment Student will be part of a team with 10+ members

 

Is overestimation of expected survival more critical than underestimation? 

Project Title Is overestimation of expected survival more critical than underestimation?
Keywords Survival analysis, restricted mean survival time, optimal decisions, clinical trial, oncology
Contact Name Dr Fabio Rigat
Contact Email frigat@its.jnj.com
Company/Lab/Department Johnson & Johnson
Address 50-100 Holmers Farm Way, High Wycombe HP12 4EG
Period of the Project 8 weeks between June and September 2022
Work Environment The student will benefit from input and mentorship from Dr Rigat, as well as from day-to-day support from other qualified members of the Janssen UK Statistics and Decision Science community. It is foreseeable that face to face time will be limited in the current pandemic. To mitigate this possibility, initial bi-weekly meetings will be organised, followed by weekly meetings if mentors and student will feel comfortable with the progress of the project. Visits to the Janssen office in High Wycombe are certainly possible, as well as face to face interactions in Cambridge within University premises. Should clinical trial data be needed to illustrate properties of survival estimators, anonymised historical datasets will be made available upon commitment to appropriate data confidentiality. Working hours and day-to-day work location are entirely flexible.
Project Open to Undergraduates;Master's (Part III) students
Background Information Clinical trials designed to established whether investigational treatments can improve patient survival traditionally rely on the assumption of proportional hazards, and trial outcome is defined with reference to the statistical significance of the hazard ratio estimate. Much has been written about the advantages and limitations of this practice, the main arguments emerging in relation to the clinical development of immune response modulators in oncology when compared to chemotherapy. The difference in restricted mean survival time (dRMST) has been proposed here as an alternative to the hazard ratio, mostly on grounds of its interpretability – it is the difference in expected survival time conditional to the trial follow-up. From a mathematical perspective, since the RMST estimates the expected survival time conditional to follow-up, it is simple to prove that it is the optimal survival estimate under symmetric (quadratic) loss. However, a symmetric loss on survival time is questionable in practice, and it is desirable to derive optimal estimators of the survival time under asymmetric losses placing emphasis on preventing overestimation. The simplest asymmetric loss function here is the 0-1 loss, penalizing overestimation. Less drastic asymmetric losses have been examined by [1] and [2], among others.
Brief Description of the Project

The objectives of this internship project are:

  1. to summarize relevant background literature and results applicable to the design of late phase clinical studies with time-to-event endpoint,
  2. to examine generalizations of the RMST optimal under asymmetric loss,
  3. to implement and demonstrate the operating characteristics of studies designed using estimates of survival differences between active and control arms under asymmetric loss.

The intern will be supported by industrial mentors as appropriate to ensure that the project objectives can be reached within the limitations of a summer project.

Numerical methods and illustrative analyses can be implemented in any popular programming language, including R, SAS, Matlab, Python.

Project deliverables will include a final report, potentially providing material of appropriate quality for publication with the student as first author, a final presentation to the industrial and academic supervisors, and computer code developed to implement estimators and data analysis.

References

[1] Bayesian approach to life testing and reliability estimation using asymmetric loss function, Journal of Statistical Planning and Inference, Volume 29, Issues 1-2, September-October 1991, Pages 21-31
[2] Bayesian Estimation and Prediction Using Asymmetric Loss Functions: Journal of the American Statistical Association: Vol 81, No 394

Prerequisite Skills Statistics;Mathematical Analysis;Simulation;Predictive Modelling
Other Skills Used in the Project  
Programming Languages Python;MATLAB;R

 

Probabilistic algorithms on a distribution-tracking computing platform

Project Title Probabilistic algorithms on a distribution-tracking computing platform
Keywords Uncertainty, computer architecture, quantum computing, probability, statistics, algorithms.
Contact Name Phillip Stanley-Marbell
Contact Email phillip@signaloid.com
Company/Lab/Department Signaloid
Address https://signaloid.ai
Period of the Project 8 weeks (June 2022 to September 2022)
Work Environment Remotely, as part of a team.
Project Open to Undergraduates; Master's (Part III) students
Background Information Signaloid is developing a new kind of microprocessor that can track probability distributions associated to values in programs. This novel hardware architecture is enabling new ways of solving many challenging problems on empirical data. The project will provide the student the opportunity to use this new and exciting computing platform to tackle new and exciting fundamental problems that also have opportunity for significant societal impact.
Brief Description of the Project

Depending on the background and interests of the intern, the project will take one of three possible forms:

(1) developing theoretical analytical bounds on the effectiveness of finite-dimensional representations of multivariate random variables;
(2) developing new algorithms that exploit the ability of uncertainty-tracking computing platforms;
(3) develop new variants of variational quantum algorithms that exploit uncertainty-tracking computer hardware architectures.

References [1] Vasileios Tsoutsouras, Orestis Kaparounakis, Bilgesu Bilgin, Chatura Samarakoon, James Meech, Jan Heck, and Phillip Stanley-Marbell. 2021. The Laplace Microarchitecture for Tracking Data Uncertainty and Its Implementation in a RISC-V Processor. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '21). Association for Computing Machinery, New York, NY, USA, 1254-1269.
[2] To get a first-hand experience of using Signaloid's computing platform, try it out for free by going to https://get.signaloid.io
Prerequisite Skills Statistics;Probability/Markov Chains
Other Skills Used in the Project Statistics;Probability/Markov Chains
Programming Languages No Preference

 

Defining the optimal domain size for geomagnetic table corrections 

Project Title Defining the optimal domain size for geomagnetic table corrections
Keywords Meteorology Geomagnetic Modelling
Contact Name Dr Edmund Stone
Contact Email ed.stone@metoffice.gov.uk
Company/Lab/Department Met Office
Address Met Office, Fitzroy Road, Exeter, EX1 3PB
Period of the Project 8 weeks, we can be flexible on start/end and duration
Work Environment We're currently hybrid working, and the balance can be decided by the student. We're happy with 100% home working, or a period visiting the office (e.g. 2 weeks in the middle), 100% in the office (rules permitting) or a mix. During COVID we've 'hosted' students remotely and it's worked. The team is around 7 people, a mix of scientists and engineers working on similar problems to measure the atmosphere.
Project Open to Undergraduates; Master's (Part III) students
Background Information Accurate and well understood measurements of the atmosphere are fundamental to numerical weather prediction models. The 'observations' are used to create the starting conditions of the physical models, nudge it in the direction that the atmosphere is moving and to validate the model results after the forecast validity time has passed. One of the measurements we use at the Met Office are winds derived from aircraft air traffic control messages. To do this we need to convert from a heading reference to magnetic north to one referenced to true north. To do this we use a lookup table (the International Geomagnetic Reference Field [IGRF]), but the aircraft may not be using the same lookup table which introduces a significant source of error. Fortunately we can correct this 'heading error' by comparing the measurements of wind to the model forecast of wind. If we assume that, on average, the model is correct we can find the discrepancy between the model and the observations for many thousands of data points for each given aircraft and use this information to create a heading correction. This reduces the measured error in the wind observations by around 50%. We routinely do this for data over the UK collected from our own receivers.
Brief Description of the Project For any given aircraft we effectively have two diverging fields. One we understand and the other we do not. We know that the fields diverge over time, figure 1 shows the difference between two different versions of the IGRF, one released in 1990 and one released in 2015, the date chosen is 1990. There is a clear difference in the fields which also changes with location over the domain chosen. For any given aircraft we can generate a figure showing how the calculated heading correction varies (assuming we have data from the aircraft in the grid square). Figure 2 shows this for a single aircraft in 2015. The pattern of the two difference fields is remarkably similar suggesting that the heading corrections are related to the changing magnetic lookup tables. It is not possible for us to know what field is being used by any aircraft, it is possible that they are using one of several similar models (including the IGRF and World Magnetic Model [WMM]) both of which are updated every 5 years and could be being applied in a dynamic way taking the date into account or using a single static field which was accurate at a single point in time. We have some evidence that for the UK the domain size is sufficiently small that the change in heading correction does not have a significant impact on the quality of the measurement (the variation with the orientation of the aircraft is more significant but still small enough that we are not concerned about it). Although it appears that the UK domain size is close to the limit for the variation in correction. Our challenge is that we are now looking at processing a 'global' data set. It is unlikely that the divergence of the fields would not be more significant globally than over the UK. We need to calculate a local heading correction for a given domain.
References
  • Stone, Edmund Keith, and Gary Pearce. 'A network of Mode-S receivers for routine acquisition of aircraft-derived meteorological data.' Journal of Atmospheric and Oceanic Technology 33.4 (2016): 757-768.
  • Stone, E. K. "A comparison of Mode-S Enhanced Surveillance observations with other in situ aircraft observations." Quarterly Journal of the Royal Meteorological Society 144.712 (2018): 695-700.
  • Mirza, Andrew K., et al. "Comparison of aircraft-derived observations with in situ research aircraft measurements." Quarterly Journal of the Royal Meteorological Society 142.701 (2016): 2949-2967.
  • de Haan, Siebren. An improved correction method for high quality wind and temperature observations derived from Mode-S EHS. KNMI, 2013.
Prerequisite Skills Mathematical physics; Python
Other Skills Used in the Project Geometry/Topology; Simulation; Data Visualization
Programming Languages Python; R

 

Leveraging mathematics and physiology to make reliable drug exposure and dose predictions in drug discovery and development 

Project Title Leveraging mathematics and physiology to make reliable drug exposure and dose predictions in drug discovery and development
Keywords Drug Development, Quantitative Clinical Pharmacology, Physiological model
Contact Name Chiara Zecchin
Contact Email chiara.x.zecchin@gsk.com
Company/Lab/Department GlaxoSmithKline, Clinical Pharmacology Modelling and Simulation
Address Gunnels Wood Rd, Stevenage, SG1 2NY
Period of the Project 8 weeks (up to 12)
Work Environment The student will be integrated in the collaborative CPMS department at GSK and can be connected with colleagues in biology, biostatistics and clinical. Flexible work location, including GSK Stevenage campus and/or remote (depends on COVID restrictions and student preference). Flexible working hours (meeting and interactions may be easier to arrange between 9AM and 5PM).
Project Open to Undergraduates; Master's (Part III) students
Background Information Drug-development is a complex multi-disciplinary process and mathematics plays a critical role in leveraging the complex body of physiological and biological knowledge. A drug's effect is often related to its concentration in blood and at the site of action, so it is essential to describe the relationship between dose and route of administration and drug concentration. For monoclonal antibodies, physiologically-based pharmacokinetic (PBPK) modelling is a mathematical technique utilised to predict the drug concentration integrating physiological, pharmacological and experimental information. These models can describe drug disposition across several species (e.g. mouse, rat, monkey, and man) and can be used to project tissue concentrations from blood data alone [1]. In addition to the use of full PBPK models, 'minimal' or 'lumped' PBPK models are of interest. These approaches simplify the model structure to only include key sites of distribution (e.g. tumour). These simplified models present a 'middle ground' between empirical and full PBPK approaches and by including physiological parameters they maintain an adequate level of anatomical relevance without requiring the complexity of full PBPK models [2].
Brief Description of the Project GSK Clinical Pharmacology Modelling and Simulation (CPMS) is investing in this exciting area of drug development to support the identification of the right doses to be investigated in clinical practice, including information on the tissue(s) affected by the disease and the amount of drug desired therein. The student will implement full PBPK models and minimal PBPK models, based on literature references. Furthermore, the student is expected to derive 'middle ground' models, for several tissues of interest, based on the mathematical, physiological and biological properties of the system. The student will compare the different approaches by simulating drug exposure in tissues of interest, for several realistic scenarios. The student can be integrated in the collaborative CPMS department at GSK and can gain understanding of drug development by working closely with CPMS colleagues, biologists and project teams.
References [1] Shah DK, Betts AM Towards a platform PBPK model to characterize the plasma and tissue disposition of monoclonal antibodies in preclinical species and human. J Pharmacokinet Pharmacodyn. 2012 Feb; 39(1):67-86.
[2] Glassman PM, Balthasar JP. Physiologically-based modeling of monoclonal antibody pharmacokinetics in drug discovery and development. Drug Metab Pharmacokinet. 2019 Feb;34(1):3-13.
Prerequisite Skills Numerical Analysis; Mathematical Analysis
Other Skills Used in the Project Simulation; Data Visualization; Software R
Programming Languages R

 

Clinically relevant loss functions for 3D medical image segmentation 

Project Title Clinically relevant loss functions for 3D medical image segmentation
Keywords deep learning, medical imaging, 3D segmentation, topology, persistent homology
Contact Name Adam Klimont
Contact Email adam.klimont@cydar.co.uk
Company/Lab/Department Cydar Medical
Address Bulbeck Mill, Mill Lane, Barrington, CB22 7QY
Period of the Project 8-10 weeks between June and September
Work Environment You will be working as part of the 9-person Science Team. We have diverse backgrounds: computer vision, computer science, maths, biomedical engineering, physics. You will be supervised by 3 machine learning engineers, who will be available for daily discussions and guidance. Most of the work will be done remotely, but we should be able to meet in person at our Barrington offices 3-5 times during the internship, public health restrictions permitting.
Project Open to Undergraduates; Master's (Part III) students
Background Information One of the most successful applications of deep neural networks (DNNs) is image segmentation, i.e. assigning a class to each pixel in an image. Segmentation DNNs are especially attractive in medical imaging, where human annotation requires expert knowledge and is time consuming. Automated medical segmentation can greatly enhance current workflows for planning treatments, guiding operations, and patient monitoring. The most popular segmentation DNNs are U-Nets. They are most commonly trained with pixel-level loss functions such as cross-entropy or Sørensen-Dice loss. These approaches have been very efficient at training and achieving high pixel-wise overlap metrics, however they do not enforce any shape constraints. This can lead to unrealistic shapes of segmented objects, as well as gaps in segmentation (false negatives, FNs), and islands of misclassified pixels (false positives, FPs). In medical context, such false results could lead to misinterpretation and/or incorrect diagnosis. Anatomical features (e.g. organs) often follow well-defined topology. We can exploit this knowledge to improve the clinically important aspects of the segmentation, e.g. to enforce connectivity, and minimise FNs/FPs. There has been a growing and diverse body of literature exploring this subject in the form of shape constraints, topology-aware loss functions, conditional random fields, etc. At Cydar we have developed a state-of-the-art system to help surgeons treat aortic aneurysm. We provide AI-enhanced tools for planning, carrying out procedures, and monitoring patients. We curate a database of thousands of expert-labelled CT scans. We would like to explore topology-preserving DNNs to better assist our clinical users and thus improve patient outcomes.
Brief Description of the Project You will explore different methods to incorporate shape constraints and topology into DNN training. Working with our computer vision and clinical experts, you will identify the relevant constraints for organ segmentation. You will then implement those methods in Python and TensorFlow. Programming experience is essential for this placement. Subsequently you will use the novel methods to train DNNs on cutting-edge GPUs in the cloud. We have developed our in-house training pipelines and it should be easy to go from prototype to full-scale training. Finally, models incorporating the new methods will be compared against our baseline segmentation nets. Our experts will assess the results to determine their clinical usefulness.
References
Prerequisite Skills Image processing; Geometry/Topology; programming
Other Skills Used in the Project  
Programming Languages Python

 

Finite-difference approach for Stokes flow with free interfaces on staggered Cartesian grids 

Project Title Finite-difference approach for Stokes flow with free interfaces on staggered Cartesian grids
Keywords Stokes flow, free interface, finite difference methods, numerical linear algebra
Contact Name Vasily Suvorov
Contact Email vasily.suvorov@silvaco.com
Company/Lab/Department Silvaco Europe, Technology Computer-Aided Design (TCAD) Department
Address Compass Point, St Ives, Cambridgeshire, PE27 5JL
Period of the Project 8-10 weeks
Work Environment The student will work on his/her own with the support and guidance from the supervisor.
Project Open to Undergraduates; Master's (Part III) students
Background Information A modern semiconductor technology involves processes where materials with free interfaces undergo a large and slow deformations. Such deformations can often be modelled by the flow of incompressible liquid where advective inertial forces are small compared with viscous forces; i.e. by the Stokes flow. The project aims to analyse the company's working numerical approach to model such flow with the aim of improving accuracy, stability and convergence
Brief Description of the Project Silvaco uses the finite difference schemes on the structured 2D and 3D Cartesian grids to simulate the Stokes flow with the free interfaces. A particular difficulty of applying such schemes is the approximation of the boundary conditions at the free interfaces and the approximation of the momentum equations near the interfaces. At such irregular points the finite difference stencils do not form a regular, orthogonal patterns but have the irregular shapes where the the equations are approximated by the method of undetermined coefficients [1]. Although such approach works in practice it requires further mathematical analysis to improve the accuracy and the stability of the numerical schemes. The student will help to better understand the mathematical properties of the suggested numerical schemes. The examples of such questions are: Given an irregular finite-difference stencil how can we estimate an accuracy of the approximation based on its geometry? What is the stability (e.g. a condition number) of this approximation? What are the properties of the resulting linear system of equations? The answers to these questions will help to improve the company's software.
References [1] Computational Methods in Partial Differential Equations. A.R.Mitchell, 1969
Prerequisite Skills Numerical Analysis; PDEs; Mathematical Analysis
Other Skills Used in the Project Fluids; Simulation
Programming Languages Python; MATLAB

 

Implantation model R&D 

Project Title Implantation model R&D
Keywords Mathematical modelling, semiconductors, TCAD, statistics, physics
Contact Name Artem Babayan
Contact Email artem.babayan@silvaco.com
Company/Lab/Department Silvaco Europe
Address Compass Point, PE275JL, St Ives
Period of the Project 8 weeks, anytime
Work Environment The project assumes the high degree of independence. The development part is expected to be done in the office (in St Ives, near Cambridge).
Project Open to Undergraduates; Master's (Part III) students
Background Information  
Brief Description of the Project Silvaco is the software engineering company developing the tools to assist in manufacturing of semiconductor devices. In UK office we mostly work on 'process simulation' side -- mathematical modelling of the processes used in manufacturing. One of such processes is implantation -- bombardment of piece of (typically) Si with ions (dopants), to change the electrical properties of the target in specific areas. The traditional method is to use a directional beam of ions. However, to achieve certain type of doping distribution in the substrate other techniques are employed. One of the is called "Plasma doping" (see reference). For the current project the objective is to find the empirical distribution of ions' directions and energies within the plasma reactor based on reactor's parameters and ion properties. This would include the studying the relevant papers and building the simplified model. The output of the project is the code, which takes the reactor parameter as an input and produces the ion distribution according to their energies and directions. If successful, this activity is likely to ultimately result in a conference or journal publication.
References "Plasma doping for silicon" Surface and Coatings Technology Volume 85, Issues 1-2, 1 November 1996, Pages 51-55
Prerequisite Skills  
Other Skills Used in the Project  
Programming Languages Python; MATLAB; C++

 

Statistical Correlation Analysis & Toolkit Development 

Project Title Statistical Correlation Analysis & Toolkit Development
Keywords Finance, data, optimisation, meta-analysis, statistics, planning model
Contact Name Amanda-Jayne Hawkins
Contact Email amanda-jayne.hawkins@siemens-healthineers.com
Company/Lab/Department Siemens Healthineers
Address Siemens Healthineers, Northern Road Sudbury CO10 2XQ United Kingdom
Period of the Project 8 - 10 weeks
Work Environment Working with Finance and Planning SMEs to collaborate on the data, business requirements and outcome for the toolkit
Project Open to Undergraduates; Master's (Part III) students
Background Information The initial focus of this will be developed in the context of spares data, it would be a further aim that the model developed through this project would provide an exemplar to support release of data from other assets for the business.
Brief Description of the Project

Based on multiple sites in Chilton Industrial Estate, Sudbury, we design and assemble blood, urine and diabetes analysis instruments. With our point-of-care testing systems, Siemens Healthineers delivers lab-accurate, actionable, and timely results on the spot. Siemens Healthineers has been Certified™ as a 'great place to work'. We are inspired to transform the way things are done - because we want what is best for our people, our customers, and ultimately to help everyone live longer and healthier lives. The initial focus of this will be developed in the context of spares data, it would be a further aim that the model developed through this project would provide an exemplar to support release of data from other assets for the business. The placement would suit an individual with a good analytical background who enjoys problem solving and attention to detail. Offering an opportunity to support a programme of work with clearly defined expectations and delivering operational solutions. The outputs from this placement will improve the efficiency of planning & inventory, finance & forecast and operational demand and focus. It will provide an excellent intern opportunity to develop skills and knowledge whilst also demonstrating competency through successful project delivery.

The aim of this placement will be to
1) assess the seasonality data of spares and cross check internal vs external actuals vs demand,
2) using the statistical evidence created to establish a forward planning trend and alert system for product,
3) utilise successful analysis to extend out to other needs such as lead times and safety stock.

A combination of the following is required:

  • Structured querying of available database and business data.
  • Analyse and assess trends in data and to produce new models of projection.
  • Liaise with site-specific leads to discuss expectation of the results.
  • Compare new developed method to current methods and results.
  • Conduct a sensitivity analysis where appropriate. Produce a 'toolkit' programme for multiplying the process out to other data analysis needs of the business.
  • Sharing of technical knowledge with the wider team through documentation and peer-to-peer learning.
  • Supporting other analytical work within the team through findings.
  • Develop visualisations of the data and identification of markers set by trend analysis.
References  
Prerequisite Skills Statistics; Probability/Markov Chains ;Predictive Modelling; Data Visualization
Other Skills Used in the Project Statistics; Probability/Markov Chains; Mathematical Analysis; Simulation; Predictive  Modelling; Database Queries; Data Visualization; App Building
Programming Languages No Preference

 

Self-supervised vehicle damage detection in multimodal data 

Project Title Self-supervised vehicle damage detection in multimodal data
Keywords artificial intelligence; machine learning; computer vision; image processing; deep learning
Contact Name Michelle Botes
Contact Email michelle@autofilltech.com
Company/Lab/Department AutoFill Technologies B.V.
Address Marineweg 1, 2241TX, Wassenaar, the Netherlands
Period of the Project 8 weeks starting 20 June 2022
Work Environment Our Head of AI & Development (Sahar Yousefi) will serve as project lead and supervisor with support from our CTO and co-founder (Daan de Cloe). The student(s) will work closely with all team members of our AI/ML engineering team (4 team members). We generally work 40 hour work weeks, Monday to Friday. Students will work remotely and AutoFill will facilitate and coordinate one week per month at our offices close to Amsterdam to work with the team in our research lab.
Project Open to Master's (Part III) students
Background Information

We are AutoFill Technologies, a high-performance team, striving to tackle the toughest challenges in Computer Vision and Machine Learning. AutoFill is a company where Deeptech meets Hardtech to develop the best systems for cutting edge inspections of objects, powered by Artificial Intelligence. We are proud that we work on the edge of what is possible and bring theoretical research to life. We are confident that our technology will become the worldwide standard for automated object inspections. Our team is at the forefront of the artificial intelligence revolution. If you're seeking to work with experienced professionals who are excited to create impact in multiple industries and if you like Deeptech, hardware and pushing some serious boundaries, then you're ready to join our team. What we do, really makes a difference. Amazing opportunities like joining AutoFill just don't come around every day. Be part of something big, from the early days.

At AutoFill, we have developed an automated object inspection system that automatically captures large, high quality multimodal scans from a vehicle in only a few seconds. We use Computer Vision and Machine Learning to optimise the quality and efficiency of the data collected, as well as to process the data into valuable information for our customers. With our multi-sensor solution, we are able to fuse the data from different types of sensors and from different viewing angles. With our systems deployed at customer locations, and our own test setup at our AutoFill Research Lab, we continuously generate large representative datasets that are used for the development and training of new AI models and algorithms.

Brief Description of the Project

With our automated vehicle inspection systems, located at customer locations in Europe, we collect thousands of datasets, containing images of vehicles, captured from multiple angles, using the RGB and polarization sensors. According to recent studies the polarization modality provides a very rich description of the abnormalities in very challenging conditions such as poor illumination and strong reflection (Blin et al.). We built our in-house data annotation team which ensures consistent high standard annotations. However, the process of data annotation in the world of AI applications needs a large amount of labour work. Your main contribution as a researcher is to investigate whether self-supervised learning in the domain of vehicle damage detection on the multimodal data can reduce the cost of annotation while still performing as well as fully-supervised models. The literature review of self-supervised learning proved model distillation opened a new way of learning which provides a decent representation using labelled and un-labelled data. In a very recent work (Koohpayegani et al.), a self-supervised AlexNet has outperformed the supervised one on ImageNet classification.

With this research project, you have the opportunity to work in a high-tech company active in building automated machine vision solutions for vehicle fleets and rail in close collaboration with AutoFill Technologies AI experts. You will also have access to large datasets of RGB and polarization images from different vehicles captured by AutoFill Technologies. Last but not least, you will have access to the Google Cloud environment for developing and testing of AI solutions.

Outline of planned activities:

  • To perform a literature study on existing methods and computational models for self-supervised learning; 
  • To select two methods and implement them in a software environment, using the datasets that are generated by the automated vehicle inspection systems of AutoFill;
  • To validate and benchmark the performance of the newly developed computational models, using the test setup at the AutoFill Research Lab
  • To write your report and/or publications.

You'll have to have affinity with programming and familiarity with python. A strong background and interest in image processing and machine learning is a must. Experience with deep learning libraries (Pytorch or Tensorflow) is preferred.

References
  • Blin, Rachel, et al. "A new multimodal RGB and polarimetric image dataset for road scenes analysis." CVPR 2020. 
  • Koohpayegani, S. A., Tejankar, A., & Pirsiavash, H. (2020). Compress: Self-supervised learning by compressing representations. NeurIPS 2020.
Prerequisite Skills Image processing; Data Visualization; Deep neural network
Other Skills Used in the Project Deep learning & data analysing
Programming Languages Python; C++

 

Semi-supervised semantic segmentation in multimodal data

Project Title Semi-supervised semantic segmentation in multimodal data
Keywords artificial intelligence; machine learning; computer vision; image processing; deep learning
Contact Name Michelle Botes
Contact Email michelle@autofilltech.com
Company/Lab/Department AutoFill Technologies B.V.
Address Marineweg 1, 2241TX, Wassenaar, the Netherlands
Period of the Project 8 weeks starting 1 June 2022
Work Environment Our Head of AI & Development (Sahar Yousefi) will serve as project lead and supervisor with support from our CTO and co-founder (Daan de Cloe). The student(s) will work closely with all team members of our AI/ML engineering team (4 team members). We generally have a 40 hour work week, Monday to Friday. Students will work remotely and AutoFill will facilitate and coordinate one week per month at our offices close to Amsterdam to work with the team in our research lab.
Project Open to Master's (Part III) students
Background Information

We are AutoFill Technologies, a high-performance team, striving to tackle the toughest challenges in Computer Vision and Machine Learning. AutoFill is a company where Deeptech meets Hardtech to develop the best systems for cutting edge inspections of objects, powered by Artificial Intelligence. We are proud that we work on the edge of what is possible and bring theoretical research to life. We are confident that our technology will become the worldwide standard for automated object inspections. Our team is at the forefront of the artificial intelligence revolution. If you're seeking to work with experienced professionals who are excited to create impact in multiple industries and if you like Deeptech, hardware and pushing some serious boundaries, then you're ready to join our team. What we do, really makes a difference. Amazing opportunities like joining AutoFill just don't come around every day. Be part of something big, from the early days.

At AutoFill, we have developed an automated object inspection system that automatically captures large, high quality multimodal scans from a vehicle in only a few seconds. We use Computer Vision and Machine Learning to optimise the quality and efficiency of the data collected, as well as to process the data into valuable information for our customers. With our multi-sensor solution, we are able to fuse the data from different types of sensors and from different viewing angles. With our systems deployed at customer locations, and our own test setup at our AutoFill Research Lab, we continuously generate large representative datasets that are used for the development and training of new AI models and algorithms.

Brief Description of the Project

With our automated vehicle inspection systems, located at customer locations in Europe, we collect thousands of datasets, containing images of vehicles, captured from multiple angles, using the RGB and polarization sensors. According to recent studies the polarization modality provides a very rich description of the abnormalities in very challenging conditions such as poor illumination and strong reflection (Blin et al.). We built our in-house data annotation team which ensures consistent high standard annotations. In a very recent work Xiang et al. presented an Efficient Attention-bridged Fusion Network to exploit complementary information coming from different optical sensors. Specifically, they incorporate polarization sensing to obtain supplementary information, considering its optical characteristics for robust representation of diverse materials. Further, Ouali et al. proposed a cross-consistency training, where an invariance of the predictions is enforced over different perturbations applied to the outputs of the encoder. Such approaches help greatly to have highly accurate yet compact models for complex visual recognition tasks.

Your main contribution as a researcher will be implementing a semi-supervised learning approach by leveraging various perturbations along with exploiting the supplementary information such as polarization data for a semantic segmentation task. You have the opportunity to work in a high-tech company active in building automated machine vision solutions for vehicle fleets and rail and close collaboration with Autofill Technologies' AI experts. You will have access to large datasets of RGB and polarization images from different vehicles captured by Autofill Technology. You will have access to the Google Cloud environment for developing and testing of AI solutions.

Outline of planned activities:

  • To perform a literature study on existing deep learning approaches for multi-modal semantic segmentation tasks ;
  • To select two methods and implement them in a software environment, using the datasets that are generated by the automated vehicle inspection systems of AutoFill;
  • To validate and benchmark the performance of the newly developed computational models, using the test setup at the AutoFill Technologies Research Lab To write your report and/or publications.

You'll have to have affinity with programming and familiarity with python. A strong background and interest in image processing and machine learning is a must. Experience with deep learning libraries (Pytorch or Tensorflow) is preferred.

References
  • Blin, Rachel, et al. "A new multimodal RGB and polarimetric image dataset for road scenes analysis." CVPR 2020.
  • Xiang, Kaite, Kailun Yang, and Kaiwei Wang. "Polarization-driven semantic segmentation via efficient attention-bridged fusion." Optics Express 29.4 (2021): 4802-4820.
  • Ouali, Yassine, Céline Hudelot, and Myriam Tami. "Semi-supervised semantic segmentation with cross-consistency training." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
Prerequisite Skills Image processing; Data Visualization; Deep neural networks
Other Skills Used in the Project Deep learning and data analyzing
Programming Languages Python; C++

 

Algorithm development for security applications 

Project Title Algorithm development for security applications
Keywords Security; ML; R&D; Data Science
Contact Name Marcus Quantrill
Contact Email marcus.quantrill@iconal.com
Company/Lab/Department Iconal Technology Ltd.
Address St Johns Innovation Centre, Cowley Road, CB4 0WS
Period of the Project At least 8 weeks, June or earlier start
Work Environment We are a small friendly team of 6 people, all working on a range of interesting diverse projects. The student will be based in our main office (or lab for data gathering) working on one or more projects with us, with a mentor on each project to help with queries, reviewing work and assigning tasks. The amount of in-person contact time may vary depending on the situation with the pandemic at the start of the project. Some amount of remote work is therefore a possibility.
Project Open to Undergraduates; Master's (Part III) students
Background Information We are a Cambridge based consultancy carrying out research and development in new and emerging technologies for homeland security, offering independent, impartial, science-based advice. This will be our fourth year offering CMP placements, and we are looking for keen, innovative, self-motivated individuals who are interested in the practical application of maths to solve real-world problems. You will be working in a small friendly (we like to think) team of scientists and engineers, and contributing directly to the output of current projects.
Brief Description of the Project Right now we do not know exactly what the student project will entail as we work in a rapidly evolving field. This year's projects are likely to be focused around one or more of developing algorithms and machine learning solutions to analyse complex sensor data or helping with tests and trials of technology. Our work is highly varied and interesting and you will likely get stuck in with all aspects of the job!
References https://www.iconal.com/ 
Prerequisite Skills Statistics; Probability/Markov Chains; Data Visualization
Other Skills Used in the Project Statistics; Probability/Markov Chains; Mathematical physics; Numerical Analysis; Image processing; Simulation; Predictive Modelling; Database Queries; Data Visualization; App Building
Programming Languages Python; MATLAB; R; C++; Python preferred, but can consider other languages if relevant.

 

Deeply Interacting Learning Systems 

Project Title Deeply Interacting Learning Systems
Keywords Deeply interacting learning systems, machine Learning, neural networks, deeply interacting learning systems
Contact Name Jamie Beacom
Contact Email jamie.beacom@smithinst.co.uk
Company/Lab/Department Smith Institute
Address 3rd Floor, Willow Court, West Way, Oxford, OX2 0JB, UK
Period of the Project 8
Work Environment The successful student will join our team of creative problem solvers. They will have regular interactions with the project supervisor and have the opportunity to discuss ideas and present findings to the rest of the technical staff. The student is welcome to work from our offices in Oxford, or remotely, or some combination of the two. We will agree the most appropriate working pattern with the successful applicant. The student will be expected to work a 37.5 hour week, and flexibility is available.
Project Open to Undergraduates; Master's (Part III) students
Background Information The Smith Institute uses core expertise in mathematical modelling and applied statistics to bring fresh thinking and new solutions to the challenges faced by our clients. We are at the forefront of the industrial and business applications of mathematics, with key client relationships in performance engineering, algorithm design, risk modelling, optimisation and data analytics. To keep at the forefront, we maintain strong links with universities and explore methods, concepts, and tools as they develop. The goal of this project is to collaboratively build our understanding of Deeply Interacting Learning Systems, a recently proposed extension of Deep Neural Networks, and how they can be implemented in the practical constraints of the real-world.
Brief Description of the Project

The analogy between Deep Neural Networks (DNN) and human brains is traditionally evoked through the identification of artificial neurons in the DNN with neurons in the human brain. A Deeply Interacting Learning Systems (DILS) [1] develops this analogy further. Using Interaction Diagrams (ID) ([2]) as a common framework for both DNNs and Interacting Dynamical Systems (IDS), a DILS is defined as an ID with the dynamic re-wiring of a DNN and the non-trivial internal wiring of an IDS. In the broadest sense, a DILS proposes to move away from the discrete phases of learning, testing, and inference in DNNs towards a system which is continuously online. It also supports a more collaborative approach towards learning than a "shouting" match between weights and biases. In this way, we should be more able to understand the relationships between data as it flows through the learning system. Suppose that we have a supervised learning problem for image classification, where the problem is to classify images of lung tissue as either showing cancerous tissue or not. A basic example of the DILS framework would be to train a DNN with the added structure of OR gates to support classification in this problem. The inclusion of the OR gates in this case already furnishes the learning system with the knowledge that it is solving a classification problem, coming closer to how a human brain would approach this task.

This project will bring these ideas above to life, with the overarching goal to make more explicit and precise the definition of DILS. Along the way, the student should develop an understanding of the limitations and advantages of this novel approach over conventional methods for supervised learning, in their application to real-world problems. The ideas underpinning this proposed approach build on the foundations established in [2] and [3]. The project will start from a literature review. This will involve experimentation with the Python package [4] implemented as part of the work in [3]. Following this, several directions could be taken depending on the interests of the student undertaking this project. This might encompass a more detailed analysis and comparison of the framework in [3] against conventional neural networks and on a wider range of examples to better understand its advantages and limitations. It could also involve generating explicit descriptions of the simple examples of neural networks which are the basis of experiments in [3] and [4] in the language of Interaction Diagrams. At the conclusion of the project, the student should understand how to formulate examples of neural networks in these novel Category Theoretic frameworks, and be able to implement and experiment with these examples, with a focus on comparison against conventional implementations.

References [1] T. Hosgood and D. Spivak, Deep neural networks as nested dynamical systems, November 2021. [Online]. Available: https://arxiv.org/pdf/2111.01297.pdf.
[2] B. Fong, D. Spivak and R. Tuyeras, Backprop as a Functor: A compositional perspective on supervised learning, May 2019. [Online]. Available: https://arxiv.org/pdf/1711.10455.pdf.
[3] G. Cruttwell, B. Gavranovia, N. Ghani and F. Zanasi, Categorical Foundations of Gradient-Based Learning, March 2021. [Online]. Available: https://arxiv.org/pdf/2103.01931v1.pdf.
[4] Numeric Optics  [Online]. Available: https://github.com/statusfailed/numeric-optics-python.git. [Accessed 13 Jan 2022].
[5] D. Spivak, D. Vagner and E. Lerman,  Algebras of Open Dynamical Systems on the Operad of Wiring Diagrams,  [Online]. Available: https://math.mit.edu/~dspivak/informatics/WD-ODE.pdf. [Accessed 14 Jan 2022].
Prerequisite Skills Machine Learning; Linear Algebra; Python coding
Other Skills Used in the Project Familiarity with Category Theory would be helpful, but the essentials for this can be picked up along the way.
Programming Languages Python

 

Quantum computing internship 

Project Title Quantum computing internship
Keywords quantum, computing, algorithms, software
Contact Name Ophelia Crawford
Contact Email ophelia.crawford@riverlane.com
Company/Lab/Department Riverlane
Address St Andrew's House, 59 St Andrew's Street, Cambridge, CB2 3BZ
Period of the Project 10-12 weeks, summer 2022
Work Environment You will join us at our office in Cambridge, UK, for 10 to 12 weeks, where you will have the opportunity to work alongside our team of software and hardware engineers, mathematicians, quantum information theorists, computational chemists and physicists - all experts in their fields. Every intern will have a dedicated supervisor and will work on a project designed to make the best use of their background and skills whilst developing their knowledge of quantum computing.
Project Open to Master's (Part III) students
Background Information Riverlane is the world's first quantum engineering company. We are hardware obsessed, qubit agile and commercially driven. We're a passionate team collaboratively tackling some of humanity's biggest opportunities, from climate change to materials science and new drug discovery. Our full-time summer internships are designed to enable current students in a technical field to translate their skills and expertise into an industrial setting.
Brief Description of the Project

What you will do:

  • Develop, devise and research algorithms and software to enhance Riverlane's capabilities, contributing to one or more projects that are core to Riverlane's goals
  • Discuss ideas with colleagues and communicate work in the form of presentations and reports
  • Develop an understanding of quantum computers and their industrial applications

What we need:

  • A current student studying for a masters degree (including the final year of an integrated undergraduate and masters degree) or PhD in physics, chemistry, mathematics, computer science, electrical engineering, or a related technical field
  • Proven ability in computational and/or theoretical work
  • Experience with at least one programming language
  • Excellent critical thinking and problem-solving ability
  • Strong communication skills, both written and verbal
  • Ability to take initiative and to work well as part of a team
  • An interest in quantum computing (extensive knowledge or experience is not required)

For more information and to apply, please visit our website: https://www.riverlane.com/internships/

References  
Prerequisite Skills  
Other Skills Used in the Project  
Programming Languages  

 

Investigation of Congestion Control Systems using Traffic Flow Models 

Project Title Investigation of Congestion Control Systems using Traffic Flow Models
Keywords Traffic, Modelling, Implementation, Programming, Experimentation
Contact Name Charles Choyce
Contact Email charles.choyce@smithinst.co.uk
Company/Lab/Department Smith Institute
Address 3rd Floor, Willow Court, West Way, Oxford, OX2 0JB, UK
Period of the Project 8
Work Environment The successful student will join our team of creative problem solvers. They will have regular interactions with the project supervisor and have the opportunity to discuss ideas and present findings to the rest of the technical staff. The student is welcome to work from our offices in Oxford, or remotely, or some combination of the two. We will agree the most appropriate working pattern with the successful applicant. The student will be expected to work a 37.5 hour week, and flexibility is available.
Project Open to Undergraduates; Master's (Part III) students
Background Information The Smith Institute uses core expertise in mathematical modelling and applied statistics to bring fresh thinking and new solutions to the challenges faced by our clients. We are at the forefront of the industrial and business applications of mathematics, with key client relationships in performance engineering, algorithm design, risk modelling, optimisation, and data analytics. To keep at the forefront, we maintain strong links with universities and explore methods, concepts and tools as they develop. The goal of this project is to collaboratively build our understanding of congestion control systems and how that can be implemented in the practical constraints of the real-world. Traffic flow and congestion modelling is a developed field in mathematics, informing traffic forecasting for infrastructure planning, management, and policy making. While the existing problem of optimising a transport network remains unsolved, smart traffic control technologies hold significant potential to reduce inner-city gridlock when used in an Intelligent Transport System (ITS). Examples of these technologies include intelligent traffic lights and streetlamps, adaptive speed limit systems, and smart speed cameras. As part of a connected network these technologies require a control system to communicate sensor data intelligently, manage congestion, and notify operators. Proper design of these control systems will be crucial to reducing congestion.
Brief Description of the Project

This project will investigate the optimal modelling methods to evaluate the effects of congestion control systems on 2 simple traffic scenarios; one macroscopic traffic flow fluid dynamics analogy, and one microscopic 4-way junction model. In the former, a student will examine a basic traffic density model representing a motorway section. A set of rules for traffic control will be implemented to understand the impact on various congestion metrics. In this case study the traffic density will be controlled using adaptive speed limits to suppress the average traffic speed to optimal levels. The objective is to maximise the outflow of traffic over the simulation's duration using minimal rules. The latter model involves the implementation of individual cars on a 4-way junction, with traffic lights controlling the flow of traffic in any direction. Cars will enter the junction from one of 4 directions and join a queue. Each will have a destination in one of the remaining 3 lanes, determined at random. The control system will have the goal of minimising the traffic build-up, in queues adjacent to traffic lights, by governing the durations with which red lights are held.

This project requires game theoretic experimentation to understand the effect on results when modifying the control system's behaviour. Students will be encouraged to decide on suitable congestion and wait-time cost metrics in order to evaluate the control system's performance. This project will most appeal to students interested in control theory, applied fluid dynamics, applied game theory, programming, and experimentation through mathematical modelling.

References  
Prerequisite Skills Statistics; PDEs; Simulation
Other Skills Used in the Project Fluids; Numerical Analysis; Data Visualization; Game Theory
Programming Languages Python

 

Diagnosing disease using whole microscope slide images 

Project Title Diagnosing disease using whole microscope slide images
Keywords histopathology, digital image analysis, duodenum, deep learning, multiple instance learning
Contact Name Elizabeth Jane Soilleux
Contact Email ejs17@cam.ac.uk
Company/Lab/Department Lyzeum Ltd/ University of Cambridge
Address Dept of Pathology, University of Cambridge
Period of the Project 8 weeks between late June and September
Work Environment Working as part of a computational research team, this project may be undertaken in person or remotely.
Project Open to Undergraduates; Master's (Part III) students
Background Information Coeliac disease is an autoimmune disorder which manifests itself upon the ingestion of gluten, a series of proteins found in wheat, barley and rye. Diagnosing coeliac disease is largely the remit of pathologists, examining duodenal biopsies, but this is time consuming and agreement on the diagnosis between original pathologists is low to moderate. The field of pathology is currently transitioning into a digital era, where biopsies are routinely scanned at high resolution and made available as whole slide images. With the advent of this digitisation comes great opportunity for the the development of automated diagnostic and decision support tools to assist pathologists in reporting slides, and help mitigate the various drawbacks of human centred diagnoses.
Brief Description of the Project Currently, we employ deep-learning-based techniques to achieve classification of whole slide images (WSIs). We now wish to build on our success in this area by developing other tools which provide insightful metrics to further inform pathologists (e.g., defining the "most diagnostic" areas of the slide) and to achieve a finer granularity of assessment. In this project, we envision a student will use deep-learning to segment regions and structures of interest in WSIs, and in turn provide useful metrics which feed into a more comprehensive biopsy assessment tool. Successfully developing one or more pilot algorithms to achieve this would bring us one step nearer to making this test a clinical reality.
References https://www.lyzeumltd.com/home
Prerequisite Skills Statistics; Mathematical Analysis; Data Visualization
Other Skills Used in the Project Numerical Analysis; PDE's; Image processing; Mathematical Analysis; Predictive Modelling; Data Visualization
Programming Languages Python

 

Pattern recognition and correction on biological assay plates 

Project Title Pattern recognition and correction on biological assay plates
Keywords Statistics, Pattern Recognition, Data correction, Data normalization
Contact Name Tianshan Lin
Contact Email tianshan.x.lin@gsk.com
Company/Lab/Department GSK, R&D, Chemoinformatics & Data Science
Address Gunnels Wood Road, Stevenage SG1 2NY
Period of the Project About 10 weeks between late June and 30 September
Work Environment Mixed remote and on site (if possible)
Project Open to Undergraduates; Master's (Part III) students
Background Information Spatial plate pattern correction is often necessary during the analysis of high throughput screens in early drug discovery assays. Patterns on these 16x24 or 32x48 well plates arise due to things like evaporation, contamination, systematic biases of automation robots, or human error. Currently, the method used to resolve these patterns is a simple smoothing algorithm (e.g. Hybrid Median), which is applied to independent plates and readouts. However, new technologies are producing data with multiple complex readouts, and there's an opportunity to develop more sophisticated plate pattern detection and correction methods.
Brief Description of the Project We would like a student to join us over the summer to
1) Pull together a knowledge base of published plate pattern correction methods, internally written methods, and ideas for the design of new methods (see 2 such examples in the links below)
2) Using our vast amount of readily available past data, apply 3-5 of these methods and evaluate results (The "truth" can partially be assumed from follow-up experiments that show if a compound was truly active or not).
3) Provide python scripts of the best method(s) that our internal developers could use as a basis for a user-friendly platform to perform plate pattern detection and correction.
References
Prerequisite Skills Statistics; Mathematical Analysis
Other Skills Used in the Project Predictive Modelling
Programming Languages Python; R

 

Optimization of a random function 

Project Title Optimization of a random function
Keywords Optimization Randomness Computation
Contact Name David Allwright and Tim Boxer
Contact Email david.allwright@smithinst.co.uk
Company/Lab/Department Smith Institute for Industrial Mathematics
Address 3rd Floor, Willow Court, West Way, Oxford, OX2 0JB, UK
Period of the Project 8 weeks between late June and September
Work Environment The student will be supervised by David Allwright and Tim Boxer, with weekly progress meetings and informal discussions between those as required. There will also be opportunity to discuss ideas and present findings to other Smith Institute staff. The student is welcome to work from our offices in Oxford, or remotely, or some combination of the two. We will agree the most appropriate working pattern with the successful applicant. The student will be expected to work a 37.5 hour week, and flexibility is available.
Project Open to Undergraduates; Master's (Part III) students
Background Information In practical applications in a wide variety of contexts it is desirable to minimize a cost C(x, y) by choice of x, when the cost also depends on unknown variables y, modelled as random. For instance the “ideal” x may be such as to minimize the expected cost. But if y lies in a high dimensional space, even evaluating the expected cost may be computationally expensive, let alone optimizing it over x.
Brief Description of the Project This project aims to explore alternative computational approaches to approximate the problem, and to bound how sub-optimal they might be in various circumstances. These approaches include both methods for generating possible x-values, and methods for deciding between those values. The project will aim to set up this problem in mathematical detail, and then explore — analytically for tractable cases and computationally otherwise — how the approaches compare, and how this comparison depends on the parameters of the problem, and on the computational parameters. This will include thinking about features like a multimodal Y -distribution, or a discontinuous cost function.
References  
Prerequisite Skills Probability/Markov Chains; Mathematical Analysis; Optimization
Other Skills Used in the Project Probability/Markov Chains; Mathematical Analysis; Optimization
Programming Languages Python; MATLAB; No Preference

 

Equity Electronic Trading Internship 

Project Title Equity Electronic Trading Internship
Keywords Equity, trading, quantitative finance, high-frequency
Contact Name Francois Le Dain
Contact Email francois.le-dain@uk.bnpparibas.com
Company/Lab/Department BNP Paribas
Address BNP Paribas, 10 Harewood Ave, London NW1 6AA
Period of the Project 10 weeks, summer 2022 (flexible)
Work Environment You will work in the office in Marylebone for 10 weeks, in a very dynamic and fast-paced environment in the heart of the equity trading floor. You will have the opportunity to work and learn alongside quants, traders & software engineers who are used to helping students develop their skills. 
Project Open to Master’s (Part III) students
Background Information BNP Paribas Global Markets provides cross-asset investment, hedging, financing, research and market intelligence to corporate and institutional clients, as well as private and retail banking networks. Global Markets' sustainable, long-term business model seamlessly connects clients to capital markets throughout 38 markets in EMEA, Asia Pacific and the Americas, offering innovative solutions and digital platforms. Through Global Markets, clients can access a full universe of opportunities in equities and equity derivatives, foreign exchange, commodity derivatives, rates and credit markets and prime solutions and financing.
Brief Description of the Project What you will do:
  • Conduct cutting-edge quantitative research using real-world market datasets to develop models that underlie and enhance electronic execution and market making
  • Work with our quant team globally (Hong-Kong, London, New-York) and learn about multiple financial markets
References  
Prerequisite Skills Statistics, Mathematical Analysis 
Other Skills Used in the Project  
Programming Languages Python

AI Methods for Video segmentation & decomposition 

Project Title AI Methods for Video segmentation & decomposition
Keywords AI, Video, Segmentation, Deep Learning, Computer Vision, Python
Contact Name Michael Roberts
Contact Email michaelr@ryff.com
Company/Lab/Department Ryff Europe Ltd
Address Nine Hills Road, Cambridge CB2 1GE
Period of the Project 8 weeks. +
Work Environment The student will be embedded in the Ryff AI team located at Hills Road in Cambridge
Project Open to Undergraduates; Master's (Part III) students
Background Information Ryff is developing AI and Rendering to insert objects into existing video content. The goals is to make insert realistic and seamless in the footage. We rely on computer vision and AI techniques to achieve these goals
Brief Description of the Project Video segmentation is challenge task. We need a generalised solution for video segmentation that be used to detect objects and other changes in scenes which cause difficulties in augmenting new objects in existing content. The project will require development of cutting edge AI techniques and solutions that can be applied to the task of Video segmentation. .
References https://www.ryff.com
Prerequisite Skills Image processing; Python, Machine Learning
Other Skills Used in the Project  
Programming Languages Python

Develop a machine learning tool applied to Veterinary CT scans 

Project Title

Develop a machine learning tool applied to Veterinary CT scans

Keywords

Machine learning, deep learning, AI, computer science, CNN, diagnostic imaging, CT, veterinary, innovation

Contact Name Julien LABRUYERE
Contact Email julien@vet-ct.com
Company/Lab/Department VetCT
Address

Hauser Forum, Broers Building, 21 JJ Thomson Avenue, CB30FA, Cambridge

Period of the Project

8 weeks between end June/July to September 2022

Work Environment The student will have the opportunity to be part of our Team in our new office in Cambridge (West Campus), and in direct contact with veterinary radiologists, company director and the IT team. No real expectation for the place of work, and complete remote work will be totally fine if preferred, to be decided with the successful candidate.
Project Open to Master’s (Part III) students
Background Information About VetCT Established in 2009 in Cambridge UK (West Campus, Broers Building), VetCT (vetct.com) provides supportive, educational teleconsulting and teleradiology and novel educational strategies for veterinary medicine. VetCT mission is to make the veterinary world a better place by delivering trusted veterinary knowledge, support, and reassurance at the point of need. VetCT works with veterinarians across the entire veterinary ecosystem (b2b), including students and universities, first opinion practitioners, and referral centres. The company has subsidiaries in both the USA and Australia, with over 250 staff globally, including 120 Diploma-holding veterinary radiologists located across the globe. The company is leader in veterinary teleradiology and as provided highest quality radiology reports to a very large number of veterinary patients since its inception. Project summary VetCT is looking to build relationships with Cambridge University and develop research projects to harness the power of its very large database of radiology images. The images consist into CT scans, MRI scans and radiographs of veterinary patients, all digitally archived in a central PACS system. VetCT has acquired over the years one of the largest animal CT scans databases in the world. Our exhaustive review of the current AI applications in veterinary diagnostic imaging demonstrates that AI is largely underdeveloped in this field. To date, only 11 scientific peer-reviewed publications involving the use of AI in veterinary radiology can be found (compared to 2189 peer--reviewed publications in the human radiology field, in 2021 only!). There are ample and unique opportunities for research, and a large potential to shape the AI innovations of the future in the veterinary and wider healthcare space.
Brief Description of the Project Phase one: Example of project of lower level of complexity: CT body area recognition AI tool. CT studies sent to VetCT always include multiple body areas (ie. head, thorax, abdomen). Before sending a study for reporting to a radiologist, all CT scans need to be manually checked for the numbers of body parts they contain and compared to the request submitted by the veterinarian. This mobilises a significant amount of internal human resource. This step could be eliminated and automated by an intelligent algorithm capable of recognising the different CT body parts. Data description and specifications • Format: Raw data of CT scans, Radiographs and MRI scans in DICOM format. Uncompressed or lossless compressed. • Image variety: large variety of equipment manufacturers and the imaging studies have been acquired from 2000 different veterinary sites present in multiple countries. • Restrictions: None. The database is readily available and under full control and management of the company. VetCT has full consent from their clients to use the anonymised database for research purpose. Animal DICOM diagnostic images are not subjects to GDPR. • Archive type: Digital central PACS system located in the cloud. • Clinical information: Every imaging studies are linked to our digital case management platform, which stores all patient information, patient signalment (species, breed, age, gender), clinical history and symptoms, final radiology diagnosis and full radiology diagnostic report including annotated pictures in every reports. • Species distribution: 80% Canine, 15% Feline, 5% Equine. • Numbers of studies readily available: CT MRI Xrays TOTAL Canine 138121 28159 123435 289715 Feline 21827 2752 28521 53100 Equine 1040 3924 6526 11490 TOTAL 160988 34835 158482 354305 Table 1: Number of radiology studies available for research purpose, as of 15th June 2021 • Radiology report: All reports are stored as .docx and PDFs documents in our case management platform and linked to the relevant DICOM images. Reports include a detailed text description of the findings and radiology diagnosis. All reports include labelled .PNG images of the most relevant pathologic findings. However, the correspondent DICOM images are not labelled. Goal for the project: Focus on the CT body area machine learning recognition tool. The outcome would be a clear plan toward the development of a working machine learning algorithm, ideally with a prototype we could test at the end of the period.
References Company website: www.vetct.com Boissady, E., de La Comble, A., Zhu, X., & Hespel, A.-M. (2020). Artificial intelligence evaluating primary thoracic lesions has an overall lower error rate compared to veterinarians or veterinarians in conjunction with the artificial intelligence. Veterinary Radiology & Ultrasound, 61(6), 619–627. https://doi.org/10.1111/vru.12912 Boissady, E., De La Comble, A., Zhu, X., Abbott, J., & Adrien-Maxence, H. (2021). Comparison of a Deep Learning Algorithm vs. Humans for Vertebral Heart Scale Measurements in Cats and Dogs Shows a High Degree of Agreement Among Readers. Frontiers in Veterinary Science, 8. https://www.frontiersin.org/article/10.3389/fvets.2021.764570 Fitzke, M., PyTorch. (2021, December 15). RADIOLOGY AI @MARS VETERINARY HEALTH |. https://www.youtube.com/watch?v=p11ldyP9aco Sharma, P., Suehling, M., Flohr, T., & Comaniciu, D. (2020). Artificial Intelligence in Diagnostic Imaging: Status Quo, Challenges, and Future Opportunities. Journal of Thoracic Imaging, 35, S11. https://doi.org/10.1097/RTI.0000000000000499
Prerequisite Skills Machine Learning
Other Skills Used in the Project  
Programming Languages Python; Machine learning language