This is a list of the Industrial CMP project proposals from summer 2022:
- TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials
- Keywords: deep learning, graph theory, graph ML, machine learning
- AI-Driven Therapeutic Responder/Non-Responder Prediction using Multi-modality Signature (TheraSign Project)
Keywords: precision medicine, personalized medicine, companion diagnostics, applied AI/ML, data science - Precision sub-typing of patient populations using "Super Progressor Phenotypes"
Keywords: precision medicine, predictive modeling, personalized medicine, biomedical AI, healthcare data science - Drug Target Discovery using Multi-omics Signatures
Keywords: Bioinformatics, drug discovery, multi-omics, genomics, proteomics - Digital Drug (Re)positioning: Data-driven Indication Discovery for New Drugs
Keywords: drug development, drug discovery, drug repositioning, data science, machine learning, deep learning - Is overestimation of expected survival more critical than underestimation?
Keywords: Survival analysis, restricted mean survival time, optimal decisions, clinical trial, oncology - Probabilistic algorithms on a distribution-tracking computing platform
Keywords: Uncertainty, computer architecture, quantum computing, probability, statistics, algorithms. - Defining the optimal domain size for geomagnetic table corrections
Keywords: Meteorology Geomagnetic Modelling - Leveraging mathematics and physiology to make reliable drug exposure and dose predictions in drug discovery and development
Keywords: Drug Development, Quantitative Clinical Pharmacology, Physiological model - Clinically relevant loss functions for 3D medical image segmentation
Keywords: deep learning, medical imaging, 3D segmentation, topology, persistent homology - Finite-difference approach for Stokes flow with free interfaces on staggered Cartesian grids
Keywords: Stokes flow, free interface, finite difference methods, numerical linear algebra - Implantation model R&D
Keywords: Mathematical modelling, semiconductors, TCAD, statistics, physics - Statistical Correlation Analysis & Toolkit Development
Keywords: Finance, data, optimisation, meta-analysis, statistics, planning model - Self-supervised vehicle damage detection in multimodal data
Keywords: artificial intelligence; machine learning; computer vision; image processing; deep learning - Semi-supervised semantic segmentation in multimodal data
Keywords: artificial intelligence; machine learning; computer vision; image processing; deep learning - Algorithm development for security applications
Keywords: Security; ML; R&D; Data Science - Deeply Interacting Learning Systems
Keywords: Deeply interacting learning systems, machine Learning, neural networks, deeply interacting learning systems - Quantum computing internship
Keywords: quantum, computing, algorithms, software - Investigation of Congestion Control Systems using Traffic Flow Models
Keywords: Traffic, Modelling, Implementation, Programming, Experimentation - Diagnosing disease using whole microscope slide images
Keywords: histopathology, digital image analysis, duodenum, deep learning, multiple instance learning - Pattern recognition and correction on biological assay plates
Keywords: Statistics, Pattern Recognition, Data correction, Data normalization - Optimization of a random function
Keywords: Optimization Randomness Computation - Equity Electronic Trading internship
Keywords: Equity, trading, quantitative finance, high-frequency - AI Methods for Video segmentation & decomposition
Keywords: AI, Video, Segmentation, Deep Learning, Computer Vision, Python - Develop a machine learning tool applied to Veterinary CT scans
Keywords: Machine learning, deep learning, AI, computer science, CNN, diagnostic imaging, CT, veterinary, innovation
TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials
Project Title | TrialGraph: Machine Intelligence Enabled Insight from Graph Modeling of Clinical Trials |
Keywords | deep learning, graph theory, graph ML, machine learning |
Contact Name | Dr. Shameer Khader |
Contact Email | shameer.khader@astrazeneca.com |
Company/Lab/Department | Data Science & Artificial Intelligence, AstraZeneca |
Address | Remote or On-Campus (AstraZeneca PLC 1 Francis Crick Avenue Cambridge Biomedical Campus Cambridge CB2 0AA) |
Period of the Project | 8-12 weeks |
Work Environment | Remote or on-campus; the student will be embedded within a team of 10+ members |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | One of the major impediments to successful drug development is the complexity, cost and scale of clinical trials, particularly large Phase III trials. Despite a wealth of historical data, clinical trial sponsors typically have a difficult time fully leveraging historical trial data to drive insight into optimal clinical trial design, reducing trial cost and scale. Many barriers exist to leveraging this data including drift in clinical terms and procedure over time, differences in trial structure and differences in data sampled. Recent advances in machine learning in areas such as Natural Language Processing (NLP) and graph modeling of complex data have enabled rapid advances in a number of domains. The TrialGraph project seeks to apply these methodologies to clinical trial data, creating a unified graph model to represent clinical trials across phases and therapeutic areas. Such a data modeling approach would enable novel and power analytics that enable efficiencies in drug development and benefit to our patients. Multiple graph modeling initiatives are running in parallel and this project will leverage their infrastructure, graph modeling of external clinical and biomedical data as well as expertise. In collaboration with this wider community, the TrialGraph project will seek to leverage these resources while developing novel graph representations of historical AZ trials, methodologies to analyze these graph representations that provide meaningful insight and experiment with other machine learning methodologies that could yield both novel discoveries and operational efficiencies. |
Brief Description of the Project |
|
References | TrialGraph v1 Manuscript: https://arxiv.org/abs/2112.08211 |
Prerequisite Skills | Geometry/Topology;Predictive Modelling;Data Visualization |
Other Skills Used in the Project | Statistics;Probability/Markov Chains |
Programming Languages | Python;R;No Preference;GraphML tools and libraries, SQL |
AI-Driven Therapeutic Responder/Non-Responder Prediction using Multi-modality Signature (TheraSign Project)
Project Title | AI-Driven Therapeutic Responder/Non-Responder Prediction using Multi-modality Signature (TheraSign Project) |
Keywords | precision medicine, personalized medicine, companion diagnostics, applied AI/ML, data science |
Contact Name | Dr. Shameer Khader |
Contact Email | shameer.khader@astrazeneca.com |
Company/Lab/Department | AstraZeneca / Machine Learning Research / Data Science & Artificial Intelligence |
Address | shameer.khader@astrazeneca.com |
Period of the Project | 8-12 weeks |
Work Environment | Student will be embedded in a team with 10+ members |
Project Open to | Undergraduates;Master's (Part III) students |
Background Information | Precise understanding of responders and non-responder populations from real-world data remains a critical aspect of drug development. In the current era of value-based contracting, understanding precise sub-population within a disease stratum remains a high-value research question. In this project, the incoming graduate student will expand our internally developed, AI-driven therapeutic responder/non-responder discovery platform. The platform - TheraSign, designed to capture the heterogeneous signature driving therapeutic responses, includes three key modules: digital phenotyping, responder/non-responder definitions and predictive modelling. The incoming student will help improve a module of choice or help to expand the application of TheraSign to apply to AZ assets. As a part of the precision medicine program, the development of TheraSign and its application is an important milestone and the incoming student will be trained to use and apply modern data integration and machine learning approaches. |
Brief Description of the Project |
|
References |
|
Prerequisite Skills | Statistics;Probability/Markov Chains;Image processing;Geometry/Topology;Predictive Modelling;Database Queries;Data Visualization;App Building |
Other Skills Used in the Project | Fluids;Simulation |
Programming Languages | Python;R;No Preference;SQL |
Work Environment | Student will be embedded in a team with 10+ members |
Precision sub-typing of patient populations using "Super Progressor Phenotypes"
Project Title | Precision sub-typing of patient populations using "Super Progressor Phenotypes" |
Keywords | precision medicine, predictive modeling, personalized medicine, biomedical AI, healthcare data science |
Contact Name | Dr. Shameer Khader |
Contact Email | shameer.khader@astrazeneca.com |
Company/Lab/Department | AstraZeneca / Machine Learning Research/ Data Science & Artificial Intelligence |
Address | Remote or onsite in one of the UK campus |
Period of the Project | 8-12 weeks |
Work Environment | Student will be part of a 10+ member team |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | Understanding the disease progression, patient-specific clinical trajectories, and factors associated with how patients progress through disease course is an emerging interest to develop precise and effective therapies. Recently, we have combined real-world data and machine learning approaches to develop an algorithm capable of identifying Super Progressors in Non-Alcoholic Steato Hepatitis (NASH). NASH is a poorly characterized disease with a global epidemiologic footprint associated with significant morbidity and mortality rates. Identifying a patient subset that shows a rapid rate of disease acceleration with a more severe phenotype as 'NASH super progressors' is a critical need. This population is of particular interest for AZ Clinical Trials as characterized subpopulation(s) of super progressors will likely have altered Benefit-Risk profiles and could help define novel endpoints. In this project, we propose to explore clinical and real-world evidence datasets, containing insurance claims data, lab values and medical diagnoses and procedures, focused on different disease of interest (type-2 diabetes, obesity, and chronic kidney disease) (GEO, VIVLI, IBM MarketScan, Optum) to develop machine learning models to develop a methodology to characterize subpopulations of NASH patients. There are also opportunities to collaborate with external academic partners on EHR data and joint research with leading clinical research centres in the UK and the US. This work will contribute to an understanding of the factors driving progressors and has the potential to benefit AstraZeneca trials in trial design and patient stratification. |
Brief Description of the Project |
AIMS AND EXPECTATIONS
|
References |
|
Prerequisite Skills | Statistics;Probability/Markov Chains;Predictive Modelling;Database Queries;Data Visualization;App Building |
Other Skills Used in the Project | |
Programming Languages | Python;R;No Preference;SQL |
Drug Target Discovery using Multi-omics Signatures
Project Title | Drug Target Discovery using Multi-omics Signatures |
Keywords | Bioinformatics, drug discovery, multi-omics, genomics, proteomics |
Contact Name | Dr. Shameer Khader |
Contact Email | shameer.khader@astrazeneca.com |
Company/Lab/Department | AstraZeneca/Machine Learning Research/Data Science & Artificial Intelligence |
Address | AstraZeneca PLC 1 Francis Crick Avenue Cambridge Biomedical Campus Cambridge CB2 0AA |
Period of the Project | 8-12 weeks |
Work Environment | Incoming student will be part of a team with 10+ members |
Project Open to | Undergraduates;Master's (Part III) students |
Background Information | Metagenomic sequencing of clinical samples has improved our understanding of how dysbiosis of microbial flora influences various human diseases. Emerging studies have shown that several microbial signatures were explicitly altered in the setting of immunological, cardiovascular, or gastrointestinal disorders, etc. Microbiome signatures, identified in the context of clinical phenotypes, offer unique challenges to understanding the specific functional pathways and metabolic reactions mediated by host-pathogen interactions. AstraZeneca is investing in this exciting and vital area to generate unique data sets and to interpret complex data to develop novel therapies. Currently, several projects are in progress to integrate microbiome with heterogeneous data sets (imaging, multi-omics, clinical, in-vivo disease models, etc.) using bioinformatics and data science approaches. We are also developing novel tools and translational bioinformatics workflows to accelerate multi-omics- driven discovery. Collectively, such an approach could lead to new targets and unique precision medicine approaches. The collective study of altered microbial taxa/species and corresponding clinical phenotype by compiling a large and diverse data set will be an essential step toward understanding the role of microbes in disease comorbidities. To achieve this goal, we are collaborating with Microbial Sciences across a portfolio of projects that span multiple disease modalities. The incoming student will develop multi-scale models capable of integrating multi-omics data with clinical and imaging data using modern machine intelligence methods. Currently, we are developing multiple translational bioinformatics resources, the incoming student could use these tools or other leading translational bioinformatics tools and analyze data pertaining to one of the focus diseases: NASH, COPD, Parkinson's Disease, IBD etc. |
Brief Description of the Project |
|
References | |
Prerequisite Skills | Statistics;Probability/Markov Chains;Image processing;Geometry/Topology;Predictive Modelling;Database Queries;Data Visualization;App Building |
Other Skills Used in the Project | |
Programming Languages | Python;R;No Preference;SQL |
Digital Drug (Re)positioning: Data-driven Indication Discovery for New Drugs
Project Title | Digital Drug (Re)positioning: Data-driven Indication Discovery for New Drugs |
Keywords | drug development, drug discovery, drug repositioning, data science, machine learning, deep learning |
Contact Name | Shameer Khader |
Contact Email | shameer.khader@astrazeneca.com |
Company/Lab/Department | AstraZeneca / Machine Learning Research / Data Science & Artificial Intelligence |
Address | shameer.khader@astrazeneca.com |
Period of the Project | 8-12 weeks |
Work Environment | Student will be part of a team with 10+ members |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | Drug repositioning is defined as a systematic or targeted evaluation of pharmaceuticals to identify new use for existing drugs. AstraZeneca is interested in Drug Positioning/Repositioning technologies for many years, and several of our drugs have a high potential for repositioning. Initially considered to be a niche area, over the years, many Biopharma companies are ramping up their presence in this sector as several COVID-19 related therapies were identified using drug repositioning approaches. With the aid of innovative emerging digital technologies, including data science and artificial intelligence, we are interested in digital drug positioning. We take drugs in various stages of development and identify new indications. Such an approach is a powerful drug development strategy that would help to delineate complex associations between diseases and drugs that mediate biological functions, including pleiotropy. We are building a suite of centralized tools, databases, and methods to augment drug repositioning efforts. Prediction results from computational approaches will be used for downstream experimental validation and function test experiments that would lead to investment decision to launch the new clinical program. Collectively, we are leveraging recent advances in machine intelligence, including deep learning, to develop new ways to enhance drug repositioning investigations. We are also developing systematic drug repositioning and positioning using internal and external data assets. |
Brief Description of the Project |
|
References |
|
Prerequisite Skills | Statistics;Probability/Markov Chains;Predictive Modelling;Database Queries;Data Visualization |
Other Skills Used in the Project | |
Programming Languages | Python;R;No Preference;SQL |
Work Environment | Student will be part of a team with 10+ members |
Is overestimation of expected survival more critical than underestimation?
Project Title | Is overestimation of expected survival more critical than underestimation? |
Keywords | Survival analysis, restricted mean survival time, optimal decisions, clinical trial, oncology |
Contact Name | Dr Fabio Rigat |
Contact Email | frigat@its.jnj.com |
Company/Lab/Department | Johnson & Johnson |
Address | 50-100 Holmers Farm Way, High Wycombe HP12 4EG |
Period of the Project | 8 weeks between June and September 2022 |
Work Environment | The student will benefit from input and mentorship from Dr Rigat, as well as from day-to-day support from other qualified members of the Janssen UK Statistics and Decision Science community. It is foreseeable that face to face time will be limited in the current pandemic. To mitigate this possibility, initial bi-weekly meetings will be organised, followed by weekly meetings if mentors and student will feel comfortable with the progress of the project. Visits to the Janssen office in High Wycombe are certainly possible, as well as face to face interactions in Cambridge within University premises. Should clinical trial data be needed to illustrate properties of survival estimators, anonymised historical datasets will be made available upon commitment to appropriate data confidentiality. Working hours and day-to-day work location are entirely flexible. |
Project Open to | Undergraduates;Master's (Part III) students |
Background Information | Clinical trials designed to established whether investigational treatments can improve patient survival traditionally rely on the assumption of proportional hazards, and trial outcome is defined with reference to the statistical significance of the hazard ratio estimate. Much has been written about the advantages and limitations of this practice, the main arguments emerging in relation to the clinical development of immune response modulators in oncology when compared to chemotherapy. The difference in restricted mean survival time (dRMST) has been proposed here as an alternative to the hazard ratio, mostly on grounds of its interpretability – it is the difference in expected survival time conditional to the trial follow-up. From a mathematical perspective, since the RMST estimates the expected survival time conditional to follow-up, it is simple to prove that it is the optimal survival estimate under symmetric (quadratic) loss. However, a symmetric loss on survival time is questionable in practice, and it is desirable to derive optimal estimators of the survival time under asymmetric losses placing emphasis on preventing overestimation. The simplest asymmetric loss function here is the 0-1 loss, penalizing overestimation. Less drastic asymmetric losses have been examined by [1] and [2], among others. |
Brief Description of the Project |
The objectives of this internship project are:
The intern will be supported by industrial mentors as appropriate to ensure that the project objectives can be reached within the limitations of a summer project. Numerical methods and illustrative analyses can be implemented in any popular programming language, including R, SAS, Matlab, Python. Project deliverables will include a final report, potentially providing material of appropriate quality for publication with the student as first author, a final presentation to the industrial and academic supervisors, and computer code developed to implement estimators and data analysis. |
References |
[1] Bayesian approach to life testing and reliability estimation using asymmetric loss function, Journal of Statistical Planning and Inference, Volume 29, Issues 1-2, September-October 1991, Pages 21-31 |
Prerequisite Skills | Statistics;Mathematical Analysis;Simulation;Predictive Modelling |
Other Skills Used in the Project | |
Programming Languages | Python;MATLAB;R |
Probabilistic algorithms on a distribution-tracking computing platform
Project Title | Probabilistic algorithms on a distribution-tracking computing platform |
Keywords | Uncertainty, computer architecture, quantum computing, probability, statistics, algorithms. |
Contact Name | Phillip Stanley-Marbell |
Contact Email | phillip@signaloid.com |
Company/Lab/Department | Signaloid |
Address | https://signaloid.ai |
Period of the Project | 8 weeks (June 2022 to September 2022) |
Work Environment | Remotely, as part of a team. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | Signaloid is developing a new kind of microprocessor that can track probability distributions associated to values in programs. This novel hardware architecture is enabling new ways of solving many challenging problems on empirical data. The project will provide the student the opportunity to use this new and exciting computing platform to tackle new and exciting fundamental problems that also have opportunity for significant societal impact. |
Brief Description of the Project |
Depending on the background and interests of the intern, the project will take one of three possible forms: (1) developing theoretical analytical bounds on the effectiveness of finite-dimensional representations of multivariate random variables; |
References | [1] Vasileios Tsoutsouras, Orestis Kaparounakis, Bilgesu Bilgin, Chatura Samarakoon, James Meech, Jan Heck, and Phillip Stanley-Marbell. 2021. The Laplace Microarchitecture for Tracking Data Uncertainty and Its Implementation in a RISC-V Processor. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '21). Association for Computing Machinery, New York, NY, USA, 1254-1269. [2] To get a first-hand experience of using Signaloid's computing platform, try it out for free by going to https://get.signaloid.io |
Prerequisite Skills | Statistics;Probability/Markov Chains |
Other Skills Used in the Project | Statistics;Probability/Markov Chains |
Programming Languages | No Preference |
Defining the optimal domain size for geomagnetic table corrections
Project Title | Defining the optimal domain size for geomagnetic table corrections |
Keywords | Meteorology Geomagnetic Modelling |
Contact Name | Dr Edmund Stone |
Contact Email | ed.stone@metoffice.gov.uk |
Company/Lab/Department | Met Office |
Address | Met Office, Fitzroy Road, Exeter, EX1 3PB |
Period of the Project | 8 weeks, we can be flexible on start/end and duration |
Work Environment | We're currently hybrid working, and the balance can be decided by the student. We're happy with 100% home working, or a period visiting the office (e.g. 2 weeks in the middle), 100% in the office (rules permitting) or a mix. During COVID we've 'hosted' students remotely and it's worked. The team is around 7 people, a mix of scientists and engineers working on similar problems to measure the atmosphere. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | Accurate and well understood measurements of the atmosphere are fundamental to numerical weather prediction models. The 'observations' are used to create the starting conditions of the physical models, nudge it in the direction that the atmosphere is moving and to validate the model results after the forecast validity time has passed. One of the measurements we use at the Met Office are winds derived from aircraft air traffic control messages. To do this we need to convert from a heading reference to magnetic north to one referenced to true north. To do this we use a lookup table (the International Geomagnetic Reference Field [IGRF]), but the aircraft may not be using the same lookup table which introduces a significant source of error. Fortunately we can correct this 'heading error' by comparing the measurements of wind to the model forecast of wind. If we assume that, on average, the model is correct we can find the discrepancy between the model and the observations for many thousands of data points for each given aircraft and use this information to create a heading correction. This reduces the measured error in the wind observations by around 50%. We routinely do this for data over the UK collected from our own receivers. |
Brief Description of the Project | For any given aircraft we effectively have two diverging fields. One we understand and the other we do not. We know that the fields diverge over time, figure 1 shows the difference between two different versions of the IGRF, one released in 1990 and one released in 2015, the date chosen is 1990. There is a clear difference in the fields which also changes with location over the domain chosen. For any given aircraft we can generate a figure showing how the calculated heading correction varies (assuming we have data from the aircraft in the grid square). Figure 2 shows this for a single aircraft in 2015. The pattern of the two difference fields is remarkably similar suggesting that the heading corrections are related to the changing magnetic lookup tables. It is not possible for us to know what field is being used by any aircraft, it is possible that they are using one of several similar models (including the IGRF and World Magnetic Model [WMM]) both of which are updated every 5 years and could be being applied in a dynamic way taking the date into account or using a single static field which was accurate at a single point in time. We have some evidence that for the UK the domain size is sufficiently small that the change in heading correction does not have a significant impact on the quality of the measurement (the variation with the orientation of the aircraft is more significant but still small enough that we are not concerned about it). Although it appears that the UK domain size is close to the limit for the variation in correction. Our challenge is that we are now looking at processing a 'global' data set. It is unlikely that the divergence of the fields would not be more significant globally than over the UK. We need to calculate a local heading correction for a given domain. |
References |
|
Prerequisite Skills | Mathematical physics; Python |
Other Skills Used in the Project | Geometry/Topology; Simulation; Data Visualization |
Programming Languages | Python; R |
Leveraging mathematics and physiology to make reliable drug exposure and dose predictions in drug discovery and development
Project Title | Leveraging mathematics and physiology to make reliable drug exposure and dose predictions in drug discovery and development |
Keywords | Drug Development, Quantitative Clinical Pharmacology, Physiological model |
Contact Name | Chiara Zecchin |
Contact Email | chiara.x.zecchin@gsk.com |
Company/Lab/Department | GlaxoSmithKline, Clinical Pharmacology Modelling and Simulation |
Address | Gunnels Wood Rd, Stevenage, SG1 2NY |
Period of the Project | 8 weeks (up to 12) |
Work Environment | The student will be integrated in the collaborative CPMS department at GSK and can be connected with colleagues in biology, biostatistics and clinical. Flexible work location, including GSK Stevenage campus and/or remote (depends on COVID restrictions and student preference). Flexible working hours (meeting and interactions may be easier to arrange between 9AM and 5PM). |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | Drug-development is a complex multi-disciplinary process and mathematics plays a critical role in leveraging the complex body of physiological and biological knowledge. A drug's effect is often related to its concentration in blood and at the site of action, so it is essential to describe the relationship between dose and route of administration and drug concentration. For monoclonal antibodies, physiologically-based pharmacokinetic (PBPK) modelling is a mathematical technique utilised to predict the drug concentration integrating physiological, pharmacological and experimental information. These models can describe drug disposition across several species (e.g. mouse, rat, monkey, and man) and can be used to project tissue concentrations from blood data alone [1]. In addition to the use of full PBPK models, 'minimal' or 'lumped' PBPK models are of interest. These approaches simplify the model structure to only include key sites of distribution (e.g. tumour). These simplified models present a 'middle ground' between empirical and full PBPK approaches and by including physiological parameters they maintain an adequate level of anatomical relevance without requiring the complexity of full PBPK models [2]. |
Brief Description of the Project | GSK Clinical Pharmacology Modelling and Simulation (CPMS) is investing in this exciting area of drug development to support the identification of the right doses to be investigated in clinical practice, including information on the tissue(s) affected by the disease and the amount of drug desired therein. The student will implement full PBPK models and minimal PBPK models, based on literature references. Furthermore, the student is expected to derive 'middle ground' models, for several tissues of interest, based on the mathematical, physiological and biological properties of the system. The student will compare the different approaches by simulating drug exposure in tissues of interest, for several realistic scenarios. The student can be integrated in the collaborative CPMS department at GSK and can gain understanding of drug development by working closely with CPMS colleagues, biologists and project teams. |
References | [1] Shah DK, Betts AM Towards a platform PBPK model to characterize the plasma and tissue disposition of monoclonal antibodies in preclinical species and human. J Pharmacokinet Pharmacodyn. 2012 Feb; 39(1):67-86. [2] Glassman PM, Balthasar JP. Physiologically-based modeling of monoclonal antibody pharmacokinetics in drug discovery and development. Drug Metab Pharmacokinet. 2019 Feb;34(1):3-13. |
Prerequisite Skills | Numerical Analysis; Mathematical Analysis |
Other Skills Used in the Project | Simulation; Data Visualization; Software R |
Programming Languages | R |
Clinically relevant loss functions for 3D medical image segmentation
Project Title | Clinically relevant loss functions for 3D medical image segmentation |
Keywords | deep learning, medical imaging, 3D segmentation, topology, persistent homology |
Contact Name | Adam Klimont |
Contact Email | adam.klimont@cydar.co.uk |
Company/Lab/Department | Cydar Medical |
Address | Bulbeck Mill, Mill Lane, Barrington, CB22 7QY |
Period of the Project | 8-10 weeks between June and September |
Work Environment | You will be working as part of the 9-person Science Team. We have diverse backgrounds: computer vision, computer science, maths, biomedical engineering, physics. You will be supervised by 3 machine learning engineers, who will be available for daily discussions and guidance. Most of the work will be done remotely, but we should be able to meet in person at our Barrington offices 3-5 times during the internship, public health restrictions permitting. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | One of the most successful applications of deep neural networks (DNNs) is image segmentation, i.e. assigning a class to each pixel in an image. Segmentation DNNs are especially attractive in medical imaging, where human annotation requires expert knowledge and is time consuming. Automated medical segmentation can greatly enhance current workflows for planning treatments, guiding operations, and patient monitoring. The most popular segmentation DNNs are U-Nets. They are most commonly trained with pixel-level loss functions such as cross-entropy or Sørensen-Dice loss. These approaches have been very efficient at training and achieving high pixel-wise overlap metrics, however they do not enforce any shape constraints. This can lead to unrealistic shapes of segmented objects, as well as gaps in segmentation (false negatives, FNs), and islands of misclassified pixels (false positives, FPs). In medical context, such false results could lead to misinterpretation and/or incorrect diagnosis. Anatomical features (e.g. organs) often follow well-defined topology. We can exploit this knowledge to improve the clinically important aspects of the segmentation, e.g. to enforce connectivity, and minimise FNs/FPs. There has been a growing and diverse body of literature exploring this subject in the form of shape constraints, topology-aware loss functions, conditional random fields, etc. At Cydar we have developed a state-of-the-art system to help surgeons treat aortic aneurysm. We provide AI-enhanced tools for planning, carrying out procedures, and monitoring patients. We curate a database of thousands of expert-labelled CT scans. We would like to explore topology-preserving DNNs to better assist our clinical users and thus improve patient outcomes. |
Brief Description of the Project | You will explore different methods to incorporate shape constraints and topology into DNN training. Working with our computer vision and clinical experts, you will identify the relevant constraints for organ segmentation. You will then implement those methods in Python and TensorFlow. Programming experience is essential for this placement. Subsequently you will use the novel methods to train DNNs on cutting-edge GPUs in the cloud. We have developed our in-house training pipelines and it should be easy to go from prototype to full-scale training. Finally, models incorporating the new methods will be compared against our baseline segmentation nets. Our experts will assess the results to determine their clinical usefulness. |
References |
|
Prerequisite Skills | Image processing; Geometry/Topology; programming |
Other Skills Used in the Project | |
Programming Languages | Python |
Finite-difference approach for Stokes flow with free interfaces on staggered Cartesian grids
Project Title | Finite-difference approach for Stokes flow with free interfaces on staggered Cartesian grids |
Keywords | Stokes flow, free interface, finite difference methods, numerical linear algebra |
Contact Name | Vasily Suvorov |
Contact Email | vasily.suvorov@silvaco.com |
Company/Lab/Department | Silvaco Europe, Technology Computer-Aided Design (TCAD) Department |
Address | Compass Point, St Ives, Cambridgeshire, PE27 5JL |
Period of the Project | 8-10 weeks |
Work Environment | The student will work on his/her own with the support and guidance from the supervisor. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | A modern semiconductor technology involves processes where materials with free interfaces undergo a large and slow deformations. Such deformations can often be modelled by the flow of incompressible liquid where advective inertial forces are small compared with viscous forces; i.e. by the Stokes flow. The project aims to analyse the company's working numerical approach to model such flow with the aim of improving accuracy, stability and convergence |
Brief Description of the Project | Silvaco uses the finite difference schemes on the structured 2D and 3D Cartesian grids to simulate the Stokes flow with the free interfaces. A particular difficulty of applying such schemes is the approximation of the boundary conditions at the free interfaces and the approximation of the momentum equations near the interfaces. At such irregular points the finite difference stencils do not form a regular, orthogonal patterns but have the irregular shapes where the the equations are approximated by the method of undetermined coefficients [1]. Although such approach works in practice it requires further mathematical analysis to improve the accuracy and the stability of the numerical schemes. The student will help to better understand the mathematical properties of the suggested numerical schemes. The examples of such questions are: Given an irregular finite-difference stencil how can we estimate an accuracy of the approximation based on its geometry? What is the stability (e.g. a condition number) of this approximation? What are the properties of the resulting linear system of equations? The answers to these questions will help to improve the company's software. |
References | [1] Computational Methods in Partial Differential Equations. A.R.Mitchell, 1969 |
Prerequisite Skills | Numerical Analysis; PDEs; Mathematical Analysis |
Other Skills Used in the Project | Fluids; Simulation |
Programming Languages | Python; MATLAB |
Implantation model R&D
Project Title | Implantation model R&D |
Keywords | Mathematical modelling, semiconductors, TCAD, statistics, physics |
Contact Name | Artem Babayan |
Contact Email | artem.babayan@silvaco.com |
Company/Lab/Department | Silvaco Europe |
Address | Compass Point, PE275JL, St Ives |
Period of the Project | 8 weeks, anytime |
Work Environment | The project assumes the high degree of independence. The development part is expected to be done in the office (in St Ives, near Cambridge). |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | |
Brief Description of the Project | Silvaco is the software engineering company developing the tools to assist in manufacturing of semiconductor devices. In UK office we mostly work on 'process simulation' side -- mathematical modelling of the processes used in manufacturing. One of such processes is implantation -- bombardment of piece of (typically) Si with ions (dopants), to change the electrical properties of the target in specific areas. The traditional method is to use a directional beam of ions. However, to achieve certain type of doping distribution in the substrate other techniques are employed. One of the is called "Plasma doping" (see reference). For the current project the objective is to find the empirical distribution of ions' directions and energies within the plasma reactor based on reactor's parameters and ion properties. This would include the studying the relevant papers and building the simplified model. The output of the project is the code, which takes the reactor parameter as an input and produces the ion distribution according to their energies and directions. If successful, this activity is likely to ultimately result in a conference or journal publication. |
References | "Plasma doping for silicon" Surface and Coatings Technology Volume 85, Issues 1-2, 1 November 1996, Pages 51-55 |
Prerequisite Skills | |
Other Skills Used in the Project | |
Programming Languages | Python; MATLAB; C++ |
Statistical Correlation Analysis & Toolkit Development
Project Title | Statistical Correlation Analysis & Toolkit Development |
Keywords | Finance, data, optimisation, meta-analysis, statistics, planning model |
Contact Name | Amanda-Jayne Hawkins |
Contact Email | amanda-jayne.hawkins@siemens-healthineers.com |
Company/Lab/Department | Siemens Healthineers |
Address | Siemens Healthineers, Northern Road Sudbury CO10 2XQ United Kingdom |
Period of the Project | 8 - 10 weeks |
Work Environment | Working with Finance and Planning SMEs to collaborate on the data, business requirements and outcome for the toolkit |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | The initial focus of this will be developed in the context of spares data, it would be a further aim that the model developed through this project would provide an exemplar to support release of data from other assets for the business. |
Brief Description of the Project |
Based on multiple sites in Chilton Industrial Estate, Sudbury, we design and assemble blood, urine and diabetes analysis instruments. With our point-of-care testing systems, Siemens Healthineers delivers lab-accurate, actionable, and timely results on the spot. Siemens Healthineers has been Certified™ as a 'great place to work'. We are inspired to transform the way things are done - because we want what is best for our people, our customers, and ultimately to help everyone live longer and healthier lives. The initial focus of this will be developed in the context of spares data, it would be a further aim that the model developed through this project would provide an exemplar to support release of data from other assets for the business. The placement would suit an individual with a good analytical background who enjoys problem solving and attention to detail. Offering an opportunity to support a programme of work with clearly defined expectations and delivering operational solutions. The outputs from this placement will improve the efficiency of planning & inventory, finance & forecast and operational demand and focus. It will provide an excellent intern opportunity to develop skills and knowledge whilst also demonstrating competency through successful project delivery. The aim of this placement will be to A combination of the following is required:
|
References | |
Prerequisite Skills | Statistics; Probability/Markov Chains ;Predictive Modelling; Data Visualization |
Other Skills Used in the Project | Statistics; Probability/Markov Chains; Mathematical Analysis; Simulation; Predictive Modelling; Database Queries; Data Visualization; App Building |
Programming Languages | No Preference |
Self-supervised vehicle damage detection in multimodal data
Project Title | Self-supervised vehicle damage detection in multimodal data |
Keywords | artificial intelligence; machine learning; computer vision; image processing; deep learning |
Contact Name | Michelle Botes |
Contact Email | michelle@autofilltech.com |
Company/Lab/Department | AutoFill Technologies B.V. |
Address | Marineweg 1, 2241TX, Wassenaar, the Netherlands |
Period of the Project | 8 weeks starting 20 June 2022 |
Work Environment | Our Head of AI & Development (Sahar Yousefi) will serve as project lead and supervisor with support from our CTO and co-founder (Daan de Cloe). The student(s) will work closely with all team members of our AI/ML engineering team (4 team members). We generally work 40 hour work weeks, Monday to Friday. Students will work remotely and AutoFill will facilitate and coordinate one week per month at our offices close to Amsterdam to work with the team in our research lab. |
Project Open to | Master's (Part III) students |
Background Information |
We are AutoFill Technologies, a high-performance team, striving to tackle the toughest challenges in Computer Vision and Machine Learning. AutoFill is a company where Deeptech meets Hardtech to develop the best systems for cutting edge inspections of objects, powered by Artificial Intelligence. We are proud that we work on the edge of what is possible and bring theoretical research to life. We are confident that our technology will become the worldwide standard for automated object inspections. Our team is at the forefront of the artificial intelligence revolution. If you're seeking to work with experienced professionals who are excited to create impact in multiple industries and if you like Deeptech, hardware and pushing some serious boundaries, then you're ready to join our team. What we do, really makes a difference. Amazing opportunities like joining AutoFill just don't come around every day. Be part of something big, from the early days. At AutoFill, we have developed an automated object inspection system that automatically captures large, high quality multimodal scans from a vehicle in only a few seconds. We use Computer Vision and Machine Learning to optimise the quality and efficiency of the data collected, as well as to process the data into valuable information for our customers. With our multi-sensor solution, we are able to fuse the data from different types of sensors and from different viewing angles. With our systems deployed at customer locations, and our own test setup at our AutoFill Research Lab, we continuously generate large representative datasets that are used for the development and training of new AI models and algorithms. |
Brief Description of the Project |
With our automated vehicle inspection systems, located at customer locations in Europe, we collect thousands of datasets, containing images of vehicles, captured from multiple angles, using the RGB and polarization sensors. According to recent studies the polarization modality provides a very rich description of the abnormalities in very challenging conditions such as poor illumination and strong reflection (Blin et al.). We built our in-house data annotation team which ensures consistent high standard annotations. However, the process of data annotation in the world of AI applications needs a large amount of labour work. Your main contribution as a researcher is to investigate whether self-supervised learning in the domain of vehicle damage detection on the multimodal data can reduce the cost of annotation while still performing as well as fully-supervised models. The literature review of self-supervised learning proved model distillation opened a new way of learning which provides a decent representation using labelled and un-labelled data. In a very recent work (Koohpayegani et al.), a self-supervised AlexNet has outperformed the supervised one on ImageNet classification. With this research project, you have the opportunity to work in a high-tech company active in building automated machine vision solutions for vehicle fleets and rail in close collaboration with AutoFill Technologies AI experts. You will also have access to large datasets of RGB and polarization images from different vehicles captured by AutoFill Technologies. Last but not least, you will have access to the Google Cloud environment for developing and testing of AI solutions. Outline of planned activities:
You'll have to have affinity with programming and familiarity with python. A strong background and interest in image processing and machine learning is a must. Experience with deep learning libraries (Pytorch or Tensorflow) is preferred. |
References |
|
Prerequisite Skills | Image processing; Data Visualization; Deep neural network |
Other Skills Used in the Project | Deep learning & data analysing |
Programming Languages | Python; C++ |
Semi-supervised semantic segmentation in multimodal data
Project Title | Semi-supervised semantic segmentation in multimodal data |
Keywords | artificial intelligence; machine learning; computer vision; image processing; deep learning |
Contact Name | Michelle Botes |
Contact Email | michelle@autofilltech.com |
Company/Lab/Department | AutoFill Technologies B.V. |
Address | Marineweg 1, 2241TX, Wassenaar, the Netherlands |
Period of the Project | 8 weeks starting 1 June 2022 |
Work Environment | Our Head of AI & Development (Sahar Yousefi) will serve as project lead and supervisor with support from our CTO and co-founder (Daan de Cloe). The student(s) will work closely with all team members of our AI/ML engineering team (4 team members). We generally have a 40 hour work week, Monday to Friday. Students will work remotely and AutoFill will facilitate and coordinate one week per month at our offices close to Amsterdam to work with the team in our research lab. |
Project Open to | Master's (Part III) students |
Background Information |
We are AutoFill Technologies, a high-performance team, striving to tackle the toughest challenges in Computer Vision and Machine Learning. AutoFill is a company where Deeptech meets Hardtech to develop the best systems for cutting edge inspections of objects, powered by Artificial Intelligence. We are proud that we work on the edge of what is possible and bring theoretical research to life. We are confident that our technology will become the worldwide standard for automated object inspections. Our team is at the forefront of the artificial intelligence revolution. If you're seeking to work with experienced professionals who are excited to create impact in multiple industries and if you like Deeptech, hardware and pushing some serious boundaries, then you're ready to join our team. What we do, really makes a difference. Amazing opportunities like joining AutoFill just don't come around every day. Be part of something big, from the early days. At AutoFill, we have developed an automated object inspection system that automatically captures large, high quality multimodal scans from a vehicle in only a few seconds. We use Computer Vision and Machine Learning to optimise the quality and efficiency of the data collected, as well as to process the data into valuable information for our customers. With our multi-sensor solution, we are able to fuse the data from different types of sensors and from different viewing angles. With our systems deployed at customer locations, and our own test setup at our AutoFill Research Lab, we continuously generate large representative datasets that are used for the development and training of new AI models and algorithms. |
Brief Description of the Project |
With our automated vehicle inspection systems, located at customer locations in Europe, we collect thousands of datasets, containing images of vehicles, captured from multiple angles, using the RGB and polarization sensors. According to recent studies the polarization modality provides a very rich description of the abnormalities in very challenging conditions such as poor illumination and strong reflection (Blin et al.). We built our in-house data annotation team which ensures consistent high standard annotations. In a very recent work Xiang et al. presented an Efficient Attention-bridged Fusion Network to exploit complementary information coming from different optical sensors. Specifically, they incorporate polarization sensing to obtain supplementary information, considering its optical characteristics for robust representation of diverse materials. Further, Ouali et al. proposed a cross-consistency training, where an invariance of the predictions is enforced over different perturbations applied to the outputs of the encoder. Such approaches help greatly to have highly accurate yet compact models for complex visual recognition tasks. Your main contribution as a researcher will be implementing a semi-supervised learning approach by leveraging various perturbations along with exploiting the supplementary information such as polarization data for a semantic segmentation task. You have the opportunity to work in a high-tech company active in building automated machine vision solutions for vehicle fleets and rail and close collaboration with Autofill Technologies' AI experts. You will have access to large datasets of RGB and polarization images from different vehicles captured by Autofill Technology. You will have access to the Google Cloud environment for developing and testing of AI solutions. Outline of planned activities:
You'll have to have affinity with programming and familiarity with python. A strong background and interest in image processing and machine learning is a must. Experience with deep learning libraries (Pytorch or Tensorflow) is preferred. |
References |
|
Prerequisite Skills | Image processing; Data Visualization; Deep neural networks |
Other Skills Used in the Project | Deep learning and data analyzing |
Programming Languages | Python; C++ |
Algorithm development for security applications
Project Title | Algorithm development for security applications |
Keywords | Security; ML; R&D; Data Science |
Contact Name | Marcus Quantrill |
Contact Email | marcus.quantrill@iconal.com |
Company/Lab/Department | Iconal Technology Ltd. |
Address | St Johns Innovation Centre, Cowley Road, CB4 0WS |
Period of the Project | At least 8 weeks, June or earlier start |
Work Environment | We are a small friendly team of 6 people, all working on a range of interesting diverse projects. The student will be based in our main office (or lab for data gathering) working on one or more projects with us, with a mentor on each project to help with queries, reviewing work and assigning tasks. The amount of in-person contact time may vary depending on the situation with the pandemic at the start of the project. Some amount of remote work is therefore a possibility. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | We are a Cambridge based consultancy carrying out research and development in new and emerging technologies for homeland security, offering independent, impartial, science-based advice. This will be our fourth year offering CMP placements, and we are looking for keen, innovative, self-motivated individuals who are interested in the practical application of maths to solve real-world problems. You will be working in a small friendly (we like to think) team of scientists and engineers, and contributing directly to the output of current projects. |
Brief Description of the Project | Right now we do not know exactly what the student project will entail as we work in a rapidly evolving field. This year's projects are likely to be focused around one or more of developing algorithms and machine learning solutions to analyse complex sensor data or helping with tests and trials of technology. Our work is highly varied and interesting and you will likely get stuck in with all aspects of the job! |
References | https://www.iconal.com/ |
Prerequisite Skills | Statistics; Probability/Markov Chains; Data Visualization |
Other Skills Used in the Project | Statistics; Probability/Markov Chains; Mathematical physics; Numerical Analysis; Image processing; Simulation; Predictive Modelling; Database Queries; Data Visualization; App Building |
Programming Languages | Python; MATLAB; R; C++; Python preferred, but can consider other languages if relevant. |
Deeply Interacting Learning Systems
Project Title | Deeply Interacting Learning Systems |
Keywords | Deeply interacting learning systems, machine Learning, neural networks, deeply interacting learning systems |
Contact Name | Jamie Beacom |
Contact Email | jamie.beacom@smithinst.co.uk |
Company/Lab/Department | Smith Institute |
Address | 3rd Floor, Willow Court, West Way, Oxford, OX2 0JB, UK |
Period of the Project | 8 |
Work Environment | The successful student will join our team of creative problem solvers. They will have regular interactions with the project supervisor and have the opportunity to discuss ideas and present findings to the rest of the technical staff. The student is welcome to work from our offices in Oxford, or remotely, or some combination of the two. We will agree the most appropriate working pattern with the successful applicant. The student will be expected to work a 37.5 hour week, and flexibility is available. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | The Smith Institute uses core expertise in mathematical modelling and applied statistics to bring fresh thinking and new solutions to the challenges faced by our clients. We are at the forefront of the industrial and business applications of mathematics, with key client relationships in performance engineering, algorithm design, risk modelling, optimisation and data analytics. To keep at the forefront, we maintain strong links with universities and explore methods, concepts, and tools as they develop. The goal of this project is to collaboratively build our understanding of Deeply Interacting Learning Systems, a recently proposed extension of Deep Neural Networks, and how they can be implemented in the practical constraints of the real-world. |
Brief Description of the Project |
The analogy between Deep Neural Networks (DNN) and human brains is traditionally evoked through the identification of artificial neurons in the DNN with neurons in the human brain. A Deeply Interacting Learning Systems (DILS) [1] develops this analogy further. Using Interaction Diagrams (ID) ([2]) as a common framework for both DNNs and Interacting Dynamical Systems (IDS), a DILS is defined as an ID with the dynamic re-wiring of a DNN and the non-trivial internal wiring of an IDS. In the broadest sense, a DILS proposes to move away from the discrete phases of learning, testing, and inference in DNNs towards a system which is continuously online. It also supports a more collaborative approach towards learning than a "shouting" match between weights and biases. In this way, we should be more able to understand the relationships between data as it flows through the learning system. Suppose that we have a supervised learning problem for image classification, where the problem is to classify images of lung tissue as either showing cancerous tissue or not. A basic example of the DILS framework would be to train a DNN with the added structure of OR gates to support classification in this problem. The inclusion of the OR gates in this case already furnishes the learning system with the knowledge that it is solving a classification problem, coming closer to how a human brain would approach this task. This project will bring these ideas above to life, with the overarching goal to make more explicit and precise the definition of DILS. Along the way, the student should develop an understanding of the limitations and advantages of this novel approach over conventional methods for supervised learning, in their application to real-world problems. The ideas underpinning this proposed approach build on the foundations established in [2] and [3]. The project will start from a literature review. This will involve experimentation with the Python package [4] implemented as part of the work in [3]. Following this, several directions could be taken depending on the interests of the student undertaking this project. This might encompass a more detailed analysis and comparison of the framework in [3] against conventional neural networks and on a wider range of examples to better understand its advantages and limitations. It could also involve generating explicit descriptions of the simple examples of neural networks which are the basis of experiments in [3] and [4] in the language of Interaction Diagrams. At the conclusion of the project, the student should understand how to formulate examples of neural networks in these novel Category Theoretic frameworks, and be able to implement and experiment with these examples, with a focus on comparison against conventional implementations. |
References | [1] T. Hosgood and D. Spivak, Deep neural networks as nested dynamical systems, November 2021. [Online]. Available: https://arxiv.org/pdf/2111.01297.pdf. [2] B. Fong, D. Spivak and R. Tuyeras, Backprop as a Functor: A compositional perspective on supervised learning, May 2019. [Online]. Available: https://arxiv.org/pdf/1711.10455.pdf. [3] G. Cruttwell, B. Gavranovia, N. Ghani and F. Zanasi, Categorical Foundations of Gradient-Based Learning, March 2021. [Online]. Available: https://arxiv.org/pdf/2103.01931v1.pdf. [4] Numeric Optics [Online]. Available: https://github.com/statusfailed/numeric-optics-python.git. [Accessed 13 Jan 2022]. [5] D. Spivak, D. Vagner and E. Lerman, Algebras of Open Dynamical Systems on the Operad of Wiring Diagrams, [Online]. Available: https://math.mit.edu/~dspivak/informatics/WD-ODE.pdf. [Accessed 14 Jan 2022]. |
Prerequisite Skills | Machine Learning; Linear Algebra; Python coding |
Other Skills Used in the Project | Familiarity with Category Theory would be helpful, but the essentials for this can be picked up along the way. |
Programming Languages | Python |
Quantum computing internship
Project Title | Quantum computing internship |
Keywords | quantum, computing, algorithms, software |
Contact Name | Ophelia Crawford |
Contact Email | ophelia.crawford@riverlane.com |
Company/Lab/Department | Riverlane |
Address | St Andrew's House, 59 St Andrew's Street, Cambridge, CB2 3BZ |
Period of the Project | 10-12 weeks, summer 2022 |
Work Environment | You will join us at our office in Cambridge, UK, for 10 to 12 weeks, where you will have the opportunity to work alongside our team of software and hardware engineers, mathematicians, quantum information theorists, computational chemists and physicists - all experts in their fields. Every intern will have a dedicated supervisor and will work on a project designed to make the best use of their background and skills whilst developing their knowledge of quantum computing. |
Project Open to | Master's (Part III) students |
Background Information | Riverlane is the world's first quantum engineering company. We are hardware obsessed, qubit agile and commercially driven. We're a passionate team collaboratively tackling some of humanity's biggest opportunities, from climate change to materials science and new drug discovery. Our full-time summer internships are designed to enable current students in a technical field to translate their skills and expertise into an industrial setting. |
Brief Description of the Project |
What you will do:
What we need:
For more information and to apply, please visit our website: https://www.riverlane.com/internships/ |
References | |
Prerequisite Skills | |
Other Skills Used in the Project | |
Programming Languages |
Investigation of Congestion Control Systems using Traffic Flow Models
Project Title | Investigation of Congestion Control Systems using Traffic Flow Models |
Keywords | Traffic, Modelling, Implementation, Programming, Experimentation |
Contact Name | Charles Choyce |
Contact Email | charles.choyce@smithinst.co.uk |
Company/Lab/Department | Smith Institute |
Address | 3rd Floor, Willow Court, West Way, Oxford, OX2 0JB, UK |
Period of the Project | 8 |
Work Environment | The successful student will join our team of creative problem solvers. They will have regular interactions with the project supervisor and have the opportunity to discuss ideas and present findings to the rest of the technical staff. The student is welcome to work from our offices in Oxford, or remotely, or some combination of the two. We will agree the most appropriate working pattern with the successful applicant. The student will be expected to work a 37.5 hour week, and flexibility is available. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | The Smith Institute uses core expertise in mathematical modelling and applied statistics to bring fresh thinking and new solutions to the challenges faced by our clients. We are at the forefront of the industrial and business applications of mathematics, with key client relationships in performance engineering, algorithm design, risk modelling, optimisation, and data analytics. To keep at the forefront, we maintain strong links with universities and explore methods, concepts and tools as they develop. The goal of this project is to collaboratively build our understanding of congestion control systems and how that can be implemented in the practical constraints of the real-world. Traffic flow and congestion modelling is a developed field in mathematics, informing traffic forecasting for infrastructure planning, management, and policy making. While the existing problem of optimising a transport network remains unsolved, smart traffic control technologies hold significant potential to reduce inner-city gridlock when used in an Intelligent Transport System (ITS). Examples of these technologies include intelligent traffic lights and streetlamps, adaptive speed limit systems, and smart speed cameras. As part of a connected network these technologies require a control system to communicate sensor data intelligently, manage congestion, and notify operators. Proper design of these control systems will be crucial to reducing congestion. |
Brief Description of the Project |
This project will investigate the optimal modelling methods to evaluate the effects of congestion control systems on 2 simple traffic scenarios; one macroscopic traffic flow fluid dynamics analogy, and one microscopic 4-way junction model. In the former, a student will examine a basic traffic density model representing a motorway section. A set of rules for traffic control will be implemented to understand the impact on various congestion metrics. In this case study the traffic density will be controlled using adaptive speed limits to suppress the average traffic speed to optimal levels. The objective is to maximise the outflow of traffic over the simulation's duration using minimal rules. The latter model involves the implementation of individual cars on a 4-way junction, with traffic lights controlling the flow of traffic in any direction. Cars will enter the junction from one of 4 directions and join a queue. Each will have a destination in one of the remaining 3 lanes, determined at random. The control system will have the goal of minimising the traffic build-up, in queues adjacent to traffic lights, by governing the durations with which red lights are held. This project requires game theoretic experimentation to understand the effect on results when modifying the control system's behaviour. Students will be encouraged to decide on suitable congestion and wait-time cost metrics in order to evaluate the control system's performance. This project will most appeal to students interested in control theory, applied fluid dynamics, applied game theory, programming, and experimentation through mathematical modelling. |
References | |
Prerequisite Skills | Statistics; PDEs; Simulation |
Other Skills Used in the Project | Fluids; Numerical Analysis; Data Visualization; Game Theory |
Programming Languages | Python |
Diagnosing disease using whole microscope slide images
Project Title | Diagnosing disease using whole microscope slide images |
Keywords | histopathology, digital image analysis, duodenum, deep learning, multiple instance learning |
Contact Name | Elizabeth Jane Soilleux |
Contact Email | ejs17@cam.ac.uk |
Company/Lab/Department | Lyzeum Ltd/ University of Cambridge |
Address | Dept of Pathology, University of Cambridge |
Period of the Project | 8 weeks between late June and September |
Work Environment | Working as part of a computational research team, this project may be undertaken in person or remotely. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | Coeliac disease is an autoimmune disorder which manifests itself upon the ingestion of gluten, a series of proteins found in wheat, barley and rye. Diagnosing coeliac disease is largely the remit of pathologists, examining duodenal biopsies, but this is time consuming and agreement on the diagnosis between original pathologists is low to moderate. The field of pathology is currently transitioning into a digital era, where biopsies are routinely scanned at high resolution and made available as whole slide images. With the advent of this digitisation comes great opportunity for the the development of automated diagnostic and decision support tools to assist pathologists in reporting slides, and help mitigate the various drawbacks of human centred diagnoses. |
Brief Description of the Project | Currently, we employ deep-learning-based techniques to achieve classification of whole slide images (WSIs). We now wish to build on our success in this area by developing other tools which provide insightful metrics to further inform pathologists (e.g., defining the "most diagnostic" areas of the slide) and to achieve a finer granularity of assessment. In this project, we envision a student will use deep-learning to segment regions and structures of interest in WSIs, and in turn provide useful metrics which feed into a more comprehensive biopsy assessment tool. Successfully developing one or more pilot algorithms to achieve this would bring us one step nearer to making this test a clinical reality. |
References | https://www.lyzeumltd.com/home |
Prerequisite Skills | Statistics; Mathematical Analysis; Data Visualization |
Other Skills Used in the Project | Numerical Analysis; PDE's; Image processing; Mathematical Analysis; Predictive Modelling; Data Visualization |
Programming Languages | Python |
Pattern recognition and correction on biological assay plates
Project Title | Pattern recognition and correction on biological assay plates |
Keywords | Statistics, Pattern Recognition, Data correction, Data normalization |
Contact Name | Tianshan Lin |
Contact Email | tianshan.x.lin@gsk.com |
Company/Lab/Department | GSK, R&D, Chemoinformatics & Data Science |
Address | Gunnels Wood Road, Stevenage SG1 2NY |
Period of the Project | About 10 weeks between late June and 30 September |
Work Environment | Mixed remote and on site (if possible) |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | Spatial plate pattern correction is often necessary during the analysis of high throughput screens in early drug discovery assays. Patterns on these 16x24 or 32x48 well plates arise due to things like evaporation, contamination, systematic biases of automation robots, or human error. Currently, the method used to resolve these patterns is a simple smoothing algorithm (e.g. Hybrid Median), which is applied to independent plates and readouts. However, new technologies are producing data with multiple complex readouts, and there's an opportunity to develop more sophisticated plate pattern detection and correction methods. |
Brief Description of the Project | We would like a student to join us over the summer to 1) Pull together a knowledge base of published plate pattern correction methods, internally written methods, and ideas for the design of new methods (see 2 such examples in the links below) 2) Using our vast amount of readily available past data, apply 3-5 of these methods and evaluate results (The "truth" can partially be assumed from follow-up experiments that show if a compound was truly active or not). 3) Provide python scripts of the best method(s) that our internal developers could use as a basis for a user-friendly platform to perform plate pattern detection and correction. |
References | |
Prerequisite Skills | Statistics; Mathematical Analysis |
Other Skills Used in the Project | Predictive Modelling |
Programming Languages | Python; R |
Optimization of a random function
Project Title | Optimization of a random function |
Keywords | Optimization Randomness Computation |
Contact Name | David Allwright and Tim Boxer |
Contact Email | david.allwright@smithinst.co.uk |
Company/Lab/Department | Smith Institute for Industrial Mathematics |
Address | 3rd Floor, Willow Court, West Way, Oxford, OX2 0JB, UK |
Period of the Project | 8 weeks between late June and September |
Work Environment | The student will be supervised by David Allwright and Tim Boxer, with weekly progress meetings and informal discussions between those as required. There will also be opportunity to discuss ideas and present findings to other Smith Institute staff. The student is welcome to work from our offices in Oxford, or remotely, or some combination of the two. We will agree the most appropriate working pattern with the successful applicant. The student will be expected to work a 37.5 hour week, and flexibility is available. |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | In practical applications in a wide variety of contexts it is desirable to minimize a cost C(x, y) by choice of x, when the cost also depends on unknown variables y, modelled as random. For instance the “ideal” x may be such as to minimize the expected cost. But if y lies in a high dimensional space, even evaluating the expected cost may be computationally expensive, let alone optimizing it over x. |
Brief Description of the Project | This project aims to explore alternative computational approaches to approximate the problem, and to bound how sub-optimal they might be in various circumstances. These approaches include both methods for generating possible x-values, and methods for deciding between those values. The project will aim to set up this problem in mathematical detail, and then explore — analytically for tractable cases and computationally otherwise — how the approaches compare, and how this comparison depends on the parameters of the problem, and on the computational parameters. This will include thinking about features like a multimodal Y -distribution, or a discontinuous cost function. |
References | |
Prerequisite Skills | Probability/Markov Chains; Mathematical Analysis; Optimization |
Other Skills Used in the Project | Probability/Markov Chains; Mathematical Analysis; Optimization |
Programming Languages | Python; MATLAB; No Preference |
Equity Electronic Trading Internship
Project Title | Equity Electronic Trading Internship |
Keywords | Equity, trading, quantitative finance, high-frequency |
Contact Name | Francois Le Dain |
Contact Email | francois.le-dain@uk.bnpparibas.com |
Company/Lab/Department | BNP Paribas |
Address | BNP Paribas, 10 Harewood Ave, London NW1 6AA |
Period of the Project | 10 weeks, summer 2022 (flexible) |
Work Environment | You will work in the office in Marylebone for 10 weeks, in a very dynamic and fast-paced environment in the heart of the equity trading floor. You will have the opportunity to work and learn alongside quants, traders & software engineers who are used to helping students develop their skills. |
Project Open to | Master’s (Part III) students |
Background Information | BNP Paribas Global Markets provides cross-asset investment, hedging, financing, research and market intelligence to corporate and institutional clients, as well as private and retail banking networks. Global Markets' sustainable, long-term business model seamlessly connects clients to capital markets throughout 38 markets in EMEA, Asia Pacific and the Americas, offering innovative solutions and digital platforms. Through Global Markets, clients can access a full universe of opportunities in equities and equity derivatives, foreign exchange, commodity derivatives, rates and credit markets and prime solutions and financing. |
Brief Description of the Project | What you will do:
|
References | |
Prerequisite Skills | Statistics, Mathematical Analysis |
Other Skills Used in the Project | |
Programming Languages | Python |
AI Methods for Video segmentation & decomposition
Project Title | AI Methods for Video segmentation & decomposition |
Keywords | AI, Video, Segmentation, Deep Learning, Computer Vision, Python |
Contact Name | Michael Roberts |
Contact Email | michaelr@ryff.com |
Company/Lab/Department | Ryff Europe Ltd |
Address | Nine Hills Road, Cambridge CB2 1GE |
Period of the Project | 8 weeks. + |
Work Environment | The student will be embedded in the Ryff AI team located at Hills Road in Cambridge |
Project Open to | Undergraduates; Master's (Part III) students |
Background Information | Ryff is developing AI and Rendering to insert objects into existing video content. The goals is to make insert realistic and seamless in the footage. We rely on computer vision and AI techniques to achieve these goals |
Brief Description of the Project | Video segmentation is challenge task. We need a generalised solution for video segmentation that be used to detect objects and other changes in scenes which cause difficulties in augmenting new objects in existing content. The project will require development of cutting edge AI techniques and solutions that can be applied to the task of Video segmentation. . |
References | https://www.ryff.com |
Prerequisite Skills | Image processing; Python, Machine Learning |
Other Skills Used in the Project | |
Programming Languages | Python |
Develop a machine learning tool applied to Veterinary CT scans
Project Title |
Develop a machine learning tool applied to Veterinary CT scans |
Keywords |
Machine learning, deep learning, AI, computer science, CNN, diagnostic imaging, CT, veterinary, innovation |
Contact Name | Julien LABRUYERE |
Contact Email | julien@vet-ct.com |
Company/Lab/Department | VetCT |
Address |
Hauser Forum, Broers Building, 21 JJ Thomson Avenue, CB30FA, Cambridge |
Period of the Project |
8 weeks between end June/July to September 2022 |
Work Environment | The student will have the opportunity to be part of our Team in our new office in Cambridge (West Campus), and in direct contact with veterinary radiologists, company director and the IT team. No real expectation for the place of work, and complete remote work will be totally fine if preferred, to be decided with the successful candidate. |
Project Open to | Master’s (Part III) students |
Background Information | About VetCT Established in 2009 in Cambridge UK (West Campus, Broers Building), VetCT (vetct.com) provides supportive, educational teleconsulting and teleradiology and novel educational strategies for veterinary medicine. VetCT mission is to make the veterinary world a better place by delivering trusted veterinary knowledge, support, and reassurance at the point of need. VetCT works with veterinarians across the entire veterinary ecosystem (b2b), including students and universities, first opinion practitioners, and referral centres. The company has subsidiaries in both the USA and Australia, with over 250 staff globally, including 120 Diploma-holding veterinary radiologists located across the globe. The company is leader in veterinary teleradiology and as provided highest quality radiology reports to a very large number of veterinary patients since its inception. Project summary VetCT is looking to build relationships with Cambridge University and develop research projects to harness the power of its very large database of radiology images. The images consist into CT scans, MRI scans and radiographs of veterinary patients, all digitally archived in a central PACS system. VetCT has acquired over the years one of the largest animal CT scans databases in the world. Our exhaustive review of the current AI applications in veterinary diagnostic imaging demonstrates that AI is largely underdeveloped in this field. To date, only 11 scientific peer-reviewed publications involving the use of AI in veterinary radiology can be found (compared to 2189 peer--reviewed publications in the human radiology field, in 2021 only!). There are ample and unique opportunities for research, and a large potential to shape the AI innovations of the future in the veterinary and wider healthcare space. |
Brief Description of the Project | Phase one: Example of project of lower level of complexity: CT body area recognition AI tool. CT studies sent to VetCT always include multiple body areas (ie. head, thorax, abdomen). Before sending a study for reporting to a radiologist, all CT scans need to be manually checked for the numbers of body parts they contain and compared to the request submitted by the veterinarian. This mobilises a significant amount of internal human resource. This step could be eliminated and automated by an intelligent algorithm capable of recognising the different CT body parts. Data description and specifications • Format: Raw data of CT scans, Radiographs and MRI scans in DICOM format. Uncompressed or lossless compressed. • Image variety: large variety of equipment manufacturers and the imaging studies have been acquired from 2000 different veterinary sites present in multiple countries. • Restrictions: None. The database is readily available and under full control and management of the company. VetCT has full consent from their clients to use the anonymised database for research purpose. Animal DICOM diagnostic images are not subjects to GDPR. • Archive type: Digital central PACS system located in the cloud. • Clinical information: Every imaging studies are linked to our digital case management platform, which stores all patient information, patient signalment (species, breed, age, gender), clinical history and symptoms, final radiology diagnosis and full radiology diagnostic report including annotated pictures in every reports. • Species distribution: 80% Canine, 15% Feline, 5% Equine. • Numbers of studies readily available: CT MRI Xrays TOTAL Canine 138121 28159 123435 289715 Feline 21827 2752 28521 53100 Equine 1040 3924 6526 11490 TOTAL 160988 34835 158482 354305 Table 1: Number of radiology studies available for research purpose, as of 15th June 2021 • Radiology report: All reports are stored as .docx and PDFs documents in our case management platform and linked to the relevant DICOM images. Reports include a detailed text description of the findings and radiology diagnosis. All reports include labelled .PNG images of the most relevant pathologic findings. However, the correspondent DICOM images are not labelled. Goal for the project: Focus on the CT body area machine learning recognition tool. The outcome would be a clear plan toward the development of a working machine learning algorithm, ideally with a prototype we could test at the end of the period. |
References | Company website: www.vetct.com Boissady, E., de La Comble, A., Zhu, X., & Hespel, A.-M. (2020). Artificial intelligence evaluating primary thoracic lesions has an overall lower error rate compared to veterinarians or veterinarians in conjunction with the artificial intelligence. Veterinary Radiology & Ultrasound, 61(6), 619–627. https://doi.org/10.1111/vru.12912 Boissady, E., De La Comble, A., Zhu, X., Abbott, J., & Adrien-Maxence, H. (2021). Comparison of a Deep Learning Algorithm vs. Humans for Vertebral Heart Scale Measurements in Cats and Dogs Shows a High Degree of Agreement Among Readers. Frontiers in Veterinary Science, 8. https://www.frontiersin.org/article/10.3389/fvets.2021.764570 Fitzke, M., PyTorch. (2021, December 15). RADIOLOGY AI @MARS VETERINARY HEALTH |. https://www.youtube.com/watch?v=p11ldyP9aco Sharma, P., Suehling, M., Flohr, T., & Comaniciu, D. (2020). Artificial Intelligence in Diagnostic Imaging: Status Quo, Challenges, and Future Opportunities. Journal of Thoracic Imaging, 35, S11. https://doi.org/10.1097/RTI.0000000000000499 |
Prerequisite Skills | Machine Learning |
Other Skills Used in the Project | |
Programming Languages | Python; Machine learning language |