skip to content
 
This is a list of industrial and governmental project proposals from Summer 2018.

AI for Pump Application Identification

Contact Name Kasper Christensen
Contact Email kkchristensen@grundfos.com
Company Name Grundfos, Data & Analytics
Address Poul Due Jensens Vej, 8850 Bjerringbro, Denmark
Period of the Project Flexible (1st of June - 30th of September)
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 3rd of March
Brief Description of the Project Grundfos' products are distributed all over the world. More and more of them are connected to the internet and can send data about their performance and their use to the cloud. However, Grundfos know very little about what their products are used for, because a waterpump can be used for many different applications. A waterpump can be used for a heating application, a drinking water application or an air conditioning application. The opportunities are many. Information about how Grundfos' products are used is crucial in the development- and improvements of future products, but Grundfos does not currently have data related to the exact application of the pumps they have sold through their worldwide network of distributors. Grundfos do however possess data on how a given pump is being used. This data may be useful in developing an analytics application that can determine the application of a given pump. The question we raise is: To what degree can information from Grundfos pumps be used to classify the pumps into distinct groups of applications?
Skills Required Analytics, Machine Learning, Data Mining, Fluid Mechanics
Skills Desired Analytics, Machine Learning, Data Mining, Fluid Mechanics

 

How much data is enough?

Contact Name Keith Hermiston
Contact Email kjhermiston@dstl.gov.uk
Company Name Defence Science and Technology Laboratory
Address Room 246, Building 005, Dstl Porton Down, Salisbury, Wilts, SP4 0JQ
Period of the Project 8 weeks
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 3rd March 2018
Brief Description of the Project When presented with an analytical task over data, we may ask, “how much data (and of what modality) do we need to collect to complete our understanding of an event?” We may imagine a Latin square (of order n) to contain the full set of discrete, labelled information items required to understand an event. If we are presented with a partially populated Latin square again of order n, what are the upper and lower bounds (denoted lcs(n) and scs(n) respectively) for the number of populated cells that we require to complete uniquely the Latin square? Under the Latin square condition, the remaining, empty cells of the Latin square are forced in their symbol assignment, thereby completing the Latin square to a unique solution. The Latin square is a rich combinatorial construct that finds equivalent representation as an orthogonal array, a strongly regular graph and is the Cayley table of a quasigroup. A recent paper [1] has offered interesting results for defining sets of Latin squares using graph colouring and provides a summary retrospective of past work. However, the previous Nelder conjecture of ⌊n^2/4⌋ remains unproven but empirically matches known scs(n) orders for n less than or equal to 8. Can new, innovative approaches to this easily understood problem advance towards a general proof of the conjecture for all n? [1] N. J. Cavenagh and R. Ramadurai, “On the distances between latin squares and the smallest defining set size,” Electronic Notes in Discrete Mathematics, vol 54, 15-20, 2016.
Skills Required Discrete mathematics
Skills Desired Interest in combinatorics, group theory, graph theory and some programming experience in R and/or Python

 

Uncertainty aggregation

Contact Name Keith Hermiston
Contact Email kjhermiston@dstl.gov.uk
Company Name Defence Science and Technology Laboratory
Address Room 246, Building 005, Dstl Porton Down, Salisbury, Wilts, SP4 0JQ
Period of the Project 8 weeks
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 3rd March 2018
Brief Description of the Project We may be presented with an analytical task of an event that draws from many modalities of data (both numerical and categorical in differing proportions). Associated with each data modality may be uncertainty, also measured in both numerical and categorical (nominal and ordinal) frameworks. Nominal uncertainty could be imagined to be an ambiguous word in a sentence while ordinal uncertainty may be expressed as a 'low', 'medium' or 'high' estimate. What theoretical constructs offer an appropriate structure for aggregating confidences of this mixed nature? Through a case study, drawing on open Web data, can we demonstrate that the proposed framework aggregates uncertainty in a consistent and meaningful manner?
Skills Required Statistics, Mathematics
Skills Desired Some experience of programming in R and/or Python

 

Image analysis for security applications

Contact Name Sam Pollock
Contact Email sam.pollock@iconal.com
Company Name Iconal Technology Ltd.
Address St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS
Period of the Project At least 8 weeks, June or earlier start
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 2 April.
Brief Description of the Project We are looking at application of new and novel techniques to analyse X-ray images for security applications, to automate the detection of forbidden or dangerous items. This may include exploration of deep learning techniques.
Skills Required Good familiarity with Python, Matlab or similar. Comfortable with automated analysis of large volumes of data.
Skills Desired Image analysis techniques. Neural networks. Understanding of physics behind X-ray imaging.

 

Developing a single-cell transcriptomic data analysis pipeline

Contact Name Amit Grover / Denise Vlachou
Contact Email denise.f.vlachou@gsk.com
Company Name GlaxoSmithKline
Address Stevenage, Hertfordshire, SG1 2NY
Period of the Project July-September
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest  
Brief Description of the Project Recent technological advances in single-cell transcriptomics have the potential to have a huge impact in the field of drug discovery. However, pipelines that we could use for target identification and validation are only in their stages of infancy. The data produced from single cell RNA-seq experiments is huge, complex, and highly stochastic, so computational methods and mathematical expertise are critical. In this project, the student will help us to develop a comprehensive and robust pipeline(s) to analyse complex datasets generated through multiple readouts for single cells. As a starting point, the student will use already established methods on single-cell RNA-seq data (and bulk RNA-seq data) to generate gene expression values. Using this the student will produce an analysis (via clear visualisations) on single cell trajectories using appropriate methods of normalisation, dimensionality reduction (eg PCA, SNE), cluster identification etc. An ambitious student will then incorporate additional phenotypic datasets (eg the results of machine learning analysis on images of the cells) to this pipeline(s), requiring the development of novel methods to analyse multiple data types, ultimately resulting in a master pipeline.
Skills Required A familiarity with coding. Basic knowledge of normalisation methods and machine learning. An interest in biology and genomics Excellent communication skills (will be working in a multidisciplinary team)
Skills Desired Proficient coder in R/Python/MATLAB Bioinformatics experience/Biology knowledge Big data handling experience

 

Financial data anomaly detection using machine learning

Contact Name Charlotte Grant
Contact Email charlotte.grant@oxam.com
Company Name Oxford Asset Management
Address OxAM House, 6 George Street, Oxford, OX12BW
Period of the Project 8 weeks - ideally starting 2nd July 2018
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest  
Brief Description of the Project

In algorithmic trading, bad data can creep in at any stage of the data pipeline. Perhaps a data vendor has made an error inputting the earnings number for a company, so that it is inconsistent with the sales number and with the previous earnings number. Perhaps there is an error in an internal process which makes its output contain a greater proportion of null values that expected. Whatever the cause, we must make decisions about data validity in real time and flag up likely anomalous data for manual inspection. The goal of this project is to compare statistical (e.g. machine learning) and rule-based (e.g. expert system) techniques for anomaly detection for a particular set of financial time series. For example, are there machine learning techniques (such as modelling normalcy with RNNs) which materially outperform existing rules? Are there machine learning techniques which perform well for certain classes of time series without hand-tuning? We suggest the following structure of the project: • Establish a baseline using simple univariate statistical techniques, modelling data using e.g. a Gaussian or t-distribution, or using non-parametric approaches • Fit a robust autoregressive model to detect unusual jumps in the univariate time series, e.g. following the approach of (Bianco, 2001) • Explore multivariate and modern deep learning-based approaches for modelling the univariate or multivariate dynamics, e.g. inspired by (Wong, 2015 & Shipmon, 2017) Students are also encouraged to explore their own ideas or try to implement other algorithms proposed in the literature - a good survey article covering the problem can be found in (Chandola, 2009). References: (Bianco, 2001) A.M. Bianco et. al. (2001), "Outlier Detection in Regression Models with ARIMA Errors using Robust Estimates." Journal of Forecasting 20(8), pp. 565-579. (Chandola, 2009) V. Chandola et al. (2009), "Anomaly detection: A survey." ACM Computing Surveys 41(3), p. 15 (Shipmon, 2017) D.T. Shipmon et al. (2017), "Time Series Anomaly Detection; Detection of anomalous drops with limited features and sparse examples in noisy highly periodic data". https://arxiv.org/abs/1708.03665 (Wong, 2015) J. Wong et al. (2015), "RAD - Outlier detection on big data." https://medium.com/netflix-techblog/rad-outlier-detection-on-big-data-d6...

Video with project description: https://drive.google.com/open?id=1gJuijhu7Y8FbQfD0pIeV8Hv9_egrTo8i

Skills Required 1. Experience with at least one scientific or statistical scripting language such as Python/NumPy/Scipy, or GNU R 2. Statistics on the level of e.g. Cambridge Part IB Statistics and Part II Statistical Modelling 3. Some experience with time series models (recommended online textbook: http://otexts.org/fpp2/)
Skills Desired Some experience with machine learning techniques (recommended resources: C. Bishop, "Pattern Recognition and Machine Learning", or this online class: https://www.coursera.org/learn/machine-learning)

 

Simulation of financial asset returns for strategic asset allocation

Contact Name Dr Joo Hee Lee
Contact Email joohee.lee@invesco.com
Company Name Invesco
Address Invesco, An der Welle 5, 60322 Frankfurt, Germany
Period of the Project 8 weeks or longer
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest  
Brief Description of the Project The simulation of stochastic differential equations for Geometric Brownian Motion by Monte Carlo methods is a well-established technique in pricing options and many other derivatives (see e.g. Paul Glasserman, 2004, Monte Carlo Methods in Financial Engineering (Springer)). This project is concerned with extending this framework to the field of strategic asset allocation (i.e. long-term investment decisions with typically more than one financial assets in the portfolio). The aim of this project is to design a financial-returns simulator for several correlated assets, which is to generate a substantial number (>10,000) of statistically equivalent returns series in a multi-variate fashion. Using the setup, we aim to create a daily returns database that will exhibit properties obtained from the corresponding historical time-series data at different frequencies, e.g. monthly or yearly. In the investment management industry, it is often observed that financial returns are expressed interchangeably in price ratios, i.e. simple returns, and in log price differences, i.e. compounded returns, between two evaluation points in time, i.e. investment horizons. While they are close enough to each other when the investment horizon is short, say 1 day, the differences grow significantly with longer horizons. In addition, these two measures have different properties in aggregation through time and across assets according to their mathematical properties (A. Meucci, Apr 2010, GARP Risk Professional, pp. 49-51). As the simulated data will be compounded returns through discretization of a stochastic differential equation but will be aggregated across different assets, we will adopt the solution Meucci (2010) put forward while making sure that the statistical properties between historical data and simulated data remain comparable at different frequency of data. It is expected that the student to have some background knowledge on option pricing and numerical analysis, and programming skills in desirably R. The project student will then learn more than the basic aspects of: (1) strategic asset allocation; (2) financial modelling; (3) connecting statistical and empirical data through mathematics. Progress permitting, an opportunity to apply the outcome to a real-world case could also be possible. This project is to be carried out in Frankfurt as part of a large quantitative strategies team at Invesco consisting of more than 10 PhDs, including the supervisor being a Cambridge alumna.
Skills Required stochastic calculus and programming numerical algorithms
Skills Desired some background knowledge on option pricing and numerical analysis, and programming skills in desirably R

 

Multivariate exponential decomposition for NMR signal processing

Contact Name Evren Yarman
Contact Email cyarman@slb.com
Company Name Schlumberger Cambridge Research
Address Schlumberger, High Cross, Madingley Road, Cambridge CB3 0EL
Period of the Project 8 weeks
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest  
Brief Description of the Project Nuclear magnetic resonance (NMR) instruments directly measure the quantity of hydrogen atoms in a sample which provides a method for determining porosity and permeability in porous materials. NMR logging tools are used to determine these properties in a geological formation for assessing petroleum reservoir quality [Akkurt et al 2009]. These downhole measurements are supported by more versatile laboratory analysis of extracted core samples. There persists an interpretation challenge for heterogeneous rock samples containing a mixture of fluids (oil, water, gas), as the archetypal NMR relaxation times (exponential time constants T2 and T1, or molecular diffusion coefficient D) are sensitive to fluid type (viscosity) and pore geometry. It is usual to numerically invert the measured decaying signals to a continuous distribution and use this result to infer sample properties from the moments of the distributed parameter. Compression algorithms are utilized to improve transmission of either raw or processed signals, or reduce the size of the inverse problem for efficient calculation, which is the focus of this project. In addition to their practical importance, NMR data exhibit a beautiful mathematical structure. The data are modeled in terms of a sum of exponentials. This structure was utilized to develop an inversion method giving a sparse representation for $T_2$ and $T_1$ distributions [Yarman et al 2013]. However, this method did not take full advantage of the multivariate nature of the exponential representation. Recent developments in multivariate exponential approximation of functions which uses spectral properties of block Hankel matrices [Andersson, Carlsson 2017] motivates a fresh look at NMR inversion methods. The data is modeled by: \[ M(n,k)=\sum_{m=1}^M a_m e^{-(k T_E)/(T_2,m)} (1-e^{-(n T_W)/(T_1,m)}) + \epsilon (n,k), \quad n,k \in Z^+ \] where $(a_m,T_{1,m},T_{2,m}) are the unknowns and (T_E,T_W ) are the given acquisition parameters. Here $T_{2,m}$ are the $T_2$ relaxation times, a_m are the $T_2$ amplitudes (which are the partial porosity of the pores), $T_{1,m}$ are the corresponding $T_1$ relaxation times (associated with the size of the pores), $T_W$ is the wait-time and $T_E$ is the time sample between consecutive echoes, also referred to as the echo-spacing, and $\epsilon(n,k)$ is referred to as the noise, which is usually assumed to be zero mean Gaussian white noise. This project focuses on the inversion problem, which aims to recover $(a_m,T_{1,m},T_{2,m})$ from the given the data $M(n,k)$ and $(T_E,T_W)$. This problem is an ill-posed problem. It was tackled using linear or non-linear methods. Depending on the method, the non-uniqueness of the solution was either approached by fixing specific relaxation time values $(T_{1,m},T_{2,m})$, imposing artificial bounds on them or regularization factors that impose smoothness on the solution. While regularized linear inversion methods can be computed efficiently, the results heavily depend on regularization. On the other hand, non-linear inversion methods may be computationally demanding. By taking advantage of the exponential nature of the data model, one may use the method presented in [Andersson, Carlsson 2017], which depends on spectral analysis of Hankel systems, to determine a small number of terms M required to represent all echo trains with in a desired error bound. Once an estimate for $(a_m,T_{1,m},T_{2,m})$ is obtained, it may be further refined using variational methods. The implicit nature of the inversion method aims to perform inversion efficiently while trading previous regularization methods with sparse representation. The aim will be to develop a new, semi-analytic inversion method and corresponding code for nuclear magnetic resonance (NMR) log measurements to obtain amplitude and sparse T1 and T2 distributions.
Skills Required Knowledge or will to learn and code in Python Interest in approximation theory, variational methods and optimization
Skills Desired  

 

Investigation into appropriate statistical models for the analysis and visualisation of data captured in clinical trials using wearable sensors

Contact Name Dr Luis Garcia-Gancedo
Contact Email luis.x.garcia-gancedo@gsk.com
Company Name GSK, Clinical Projects & Quantitative Sciences
Address GSK, Gunnels Wood Road, Stevenage SG1 2NY
Period of the Project 8-10 weeks (flexible)
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3rd March
Brief Description of the Project The current rapid evolution of wearable sensors and devices for the collection of health-related data is laying the foundation for the next revolution in clinical trial operations. Wearable health monitors offer capabilities to collect semi-continuous, accurate health data in near-real time. This emerging digital research platform has the potential not only to increase data accuracy and timeliness but most importantly enables the collection of ‘real-world’ data, providing insights into the effect of therapies on patients’ daily lives, ultimately allowing pharmaceutical companies to explain the value of their medications beyond traditional efficacy measurements. At GSK we are investigating the use of wearables in our clinical studies, with specific focus on actigraphy (remote monitoring of physical activity through inertial sensors). A wide range of diseases - such as Rheumatoid Arthritis (RA) and Chronic Obstructive Pulmonary Disease (COPD) - have a negative effect on physical activity, affecting the amount, type and way that patients perform certain activities and manoeuvres. Using wearable physical activity monitors in clinical trials enables us to monitor patients' physical activity and rest cycles regularly between clinical visits, however extracting meaningful clinical information (and interpreting this information) is a major challenge: the high-frequency time-series nature of the data together with the vast volume provided by wearables (and inertial sensors in particular) make this type of data completely different from any other clinical data generated in clinical studies and for that reason the most appropriate statistical mathematical methodologies and techniques to maximise the information extracted from the data are still to be determined. Additionally, early investigational studies have shown that the variability of the data due to patients’ different behaviours and lifestyles is significantly greater than other clinical data and therefore appropriate statistical models for data analysis and visualisation need to be further investigated. Through this project, we would like to investigate suitable statistical models for the analysis and visualisation of clinical data from wearable devices, with particular focus on actigraphy data. The end goal is to assess the impact of a therapeutic intervention on patients. This is a broad, open-ended project in which the student would be required to work closely with colleagues from two different departments: ‘Clinical Innovation & Digital Platforms’ which has as a remit of modernising GSK’s clinical studies by enabling the introduction of novel digital technologies and ‘Statistical, Programming and Data Strategy’ which underpins GSK R&D’s ability to make high-quality quantitative decisions across medicine development lifecycle.
Skills Required • Highly-motivated candidate with a mathematical background. • Interest in statistics/data sciences and programming abilities (such as Matlab, R, Python, C, Java, etc). • Interest in solving real-world problems. • Effective interpersonal and communication skills.
Skills Desired • Background on stochastic processes. • Imaginative and inquisitive mind, eager to learn new skills and to develop solutions for interesting and complex challenges. • Evidence of successfully working independently as well as a part of a team

 

Index of Suspicion: Predicting Cancer from Prescriptions

Contact Name Josephine French
Contact Email josephine.french@phe.gov.uk
Company Name Health Data Insight
Address CPC4, Capital Park, Fulbourn, Cambridge.
Period of the Project 2-3 months between June and September.
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest Midnight on Friday 16th February 2018.
Brief Description of the Project

Health Data Insight has worked with Public Health England and the NHS Business Services Authority to create a database of England’s primary care prescriptions data. This has been linked to the Cancer Analysis System (CAS), a national database of all cancer diagnoses and treatment in England. The aim of the Index of Suspicion project is to use machine learning to identify patterns in medication prescribed prior to the diagnosis of cancer to derive an “index of suspicion” that will predict when a patient is at increased likelihood of developing subsequent cancer. The exact direction of the internship is dependent on the intern’s interests, but possible areas include:

  • The development and/or refinement of machine learning or statistical methods.
  • The development and implementation of statistical testing of the validity of conclusions. How significant are the results? What are the best ways to communicate this, and what are the possible impacts on patients?
  • Human interpretability of models.

 More information about the internship programme is available here.

Skills Required This project involves analysing datasets containing anonymised personal information, so information governance training will be provided. Creativity and an interest in cancer research are expected. A mathematics/computer-science background, particularly experience in machine learning or statistics, would be highly useful.
Skills Desired Experience using MATLAB, R, or NumPy/pandas would be beneficial. Experience with SQL.

 

Applied Machine Learning in Finance

Contact Name David West
Contact Email david.west@lbbwuk.com
Company Name LBBW
Address 7th Floor, 201 Bishopsgate, London, EC2M 3UN
Period of the Project 8-12
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project You will be presented with a selection of problems across different business areas such as market making and customer analysis. Working in a small group you will come up with ideas and implement them using R. You will have the opportunity to speak to people across the bank and get an insight into different business units and how the bank works.
Skills Required  
Skills Desired Interest in Problem Solving.

 

Assessment of data completeness in the National Cancer Registry and the impact on the production of Cancer Survival Statistics

Contact Name Josephine French
Contact Email josephine.french@phe.gov.uk
Company Name Health Data Insight
Address 5 St Philip's Place, Birmingham
Period of the Project 2-3 months between June and September.
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest Midnight on Friday 16th February 2018.
Brief Description of the Project

We are seeking a new perspective in the assessment of data completeness in the Cancer Analysis System (CAS), which holds the data from the National Cancer Registry, and how this impacts on the production and development of cancer survival statistics, on which we (PHE) collaborate with the Office for National Statistics (ONS). These statistics are released on an annual basis for the Department of Health (DH) and other external stakeholders. In this internship, there will be elements of learning/training which will be provided by the supervisors. The project will involve a combination of the following, depending on the intern’s interests, or an alternative project if proposed by the intern and mutually agreed:

• Structured querying of national cancer database.

• Analyse and produce a report on the patterns of missing data.

• Develop algorithm(s) using multiple imputation methodology to deal with issues of missing data.

• Produce Cancer Survival Statistics using the methodology developed for the Official Statistics publications in conjunction with the implementation of the newly developed algorithm(s) to deal with missing data.

• To conduct a sensitivity analysis of the impact of the Cancer Survival Statistics from missing data.

• Produce a peer-reviewed publication or internal report on the impact of missing data on Cancer Survival Statistics.

More information about the internship programme is available here.

Skills Required The project involves analysing datasets containing anonymised personal information, so information governance training will be provided. Creativity and an interest in cancer research are expected. Some knowledge of mathematics, statistics, and probability are required.
Skills Desired An interest in statistical theory and, in particular, experience with multiple imputation (MI). Experience using SQL or statistical software (such as Stata) would be highly beneficial for this internship.

 

What can we learn about cancer by modelling the data on it?

Contact Name Josephine French
Contact Email josephine.french@phe.gov.uk
Company Name Health Data Insight
Address CPC4, Capital Park, Fulbourn, Cambridge.
Period of the Project 2-3 months between June and September.
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest Midnight on Friday 16th February 2018.
Brief Description of the Project

The National Cancer Registration and Analysis Service holds data on all diagnoses and treatments of cancer in England. The Simulacrum project has created models for these datasets, which have been used to generate synthetic data to allow researchers to explore individual-level cancer data without threatening the privacy of individual patients. You would use statistics and machine learning, together with the data we hold, to look at questions like:

  • What do these models tell us about cancer directly?
  • Can these models identify data quality problems with the underlying data?
  • How can we improve the modelling methodology?
  • Can we be more data-driven while respecting privacy restrictions?
  • What interesting questions might such models suggest people ask?

 More information about the internship programme is available here.

Skills Required This project involves datasets containing anonymised personal information, so information governance training will be provided. Creativity and an interest in cancer research are expected.
Skills Desired An interest in statistical theory, probability, and machine learning. Experience using Matlab, SQL, or statistical software. An interest in working with real-world cancer data.

 

Develop a tool for inferring symptoms from prescriptions histories for cancer patients

Contact Name Josephine French
Contact Email josephine.french@phe.gov.uk
Company Name Health Data Insight
Address CPC4, Capital Park, Fulbourn, Cambridge.
Period of the Project 2-3 months between June and September.
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest Midnight on Friday 16th February 2018.
Brief Description of the Project

Prescriptions filled in the community setting in England are recorded in the Prescriptions Dataset. These prescriptions are made by healthcare professionals who select pharmaceuticals with indications appropriate for each patient’s illness. Is it possible to reverse engineer this process for cancer patients? This project will look at leveraging Prescriptions data, the Cancer Analysis System, Healthcare Episode Statistics, and The British National Formulary with supervised and unsupervised machine learning techniques in combinations with rule based systems to attempt to develop a tool for inferring the symptoms of cancer patients.

More information about the internship programme is available here.

Skills Required SQL. R.
Skills Desired Exposure to machine learning techniques and/or statistics.

 

Developing an optimisation algorithm to supervise active learning in drug discovery

Contact Name Dr David Marcus
Contact Email david.x.marcus@gsk.com
Company Name GlaxoSmithKline
Address Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
Period of the Project 8 weeks
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project One of the challenges in drug discovery is to optimise the chemical composition of an initial set of promising molecules to improve multiple properties that influence how the drug behaves in the human body, such as potency, bioavailability and possible adverse effects. Computer-aided drug design facilitate this process by applying machine learning models that predict these properties and can virtually consider huge number of possible molecules to focus the search for molecules that will be tested later in the lab. However, some of these models might under-perform with a limited amount of data which may result in suggesting molecules with lower activity but high similarity to the initial set. Active-learning is a relatively new approach in drug discovery to assist the selection of potential molecules by suggesting those that could also improve these models in an iterative/feedback manner over several cycles and suggest structurally novel molecules. GSK has been implementing this approach that could potentially reduce costs and development time by the increasing the ability to calculate better performing models and suggest molecules with improved multiple properties. We are currently evaluating several optimisation algorithms to adapt to each iteration and suggest the amount of structurally modifications with respect to multiple properties needed. This well-defined project seeks an enthusiastic individual that will investigate which optimisation algorithm performs best by evaluating and comparing their potential to supervise an active learning process. To do that, we will also need to develop the necessary metrics to control these decisions, preferably based on multiple parameter optimisation. The outcome of this short study will assist our scientists to evaluate this approach during drug discovery processes in the future.
Skills Required Python or R basic programming skills
Skills Desired Knowledge of optimisation algorithms and machine learning methods

 

Translating novel cancer datasets into innovative visualisations

Contact Name Josephine French
Contact Email josephine.french@phe.gov.uk
Company Name Health Data Insight
Address London.
Period of the Project 2-3 months between June and September.
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest Midnight on Friday 16th February 2018.
Brief Description of the Project

The National Cancer Registration and Analysis Service (NCRAS) produces a wide range of statistical analysis and datasets used by many different audiences, including the public. Prominent productions include the Routes to Diagnosis dataset which examines the events in a patient’s pathway that lead to diagnosis, and the Get Data Out project which makes granular data more widely available while preserving individual patients’ privacy. Such datasets drive research and transparency, as well as contributing to the improvement of patient outcomes. This project aims to create a wide range of accessible visualisations from these large datasets. There will be three main strands as part of this:

• To translate the main stories for a dataset into an accurate and accessible visualisation.

• To enable elements of interactivity in the visualisations, with the ability to adapt the display to audience needs (e.g. to toggle confidence intervals on a display).

• To produce visualisations that are scalable and adaptable - that can be easily updated as the datasets expand to include extra years or new cancer sites for example.

These visualisations will be used on websites, published outputs, for presentations, and at conferences to promote cancer analysis and help grant increased insight into otherwise complex statistics.

More information about the internship programme is available here.

Skills Required This project involves datasets containing anonymised personal information, so information governance training will be provided. Creativity and an interest in cancer research are expected. The ability to interpret complex data and translate it into a graphical display will be essential.
Skills Desired An interest in public health and communication of data and health information, data journalism, or statistical exploration of data. Experience using R, JavaScript, or other programming languages, experience with visual analytic packages such as Tableau or SAS VA, and an interest in use of graphics. Use of D3 a bonus.

 

Mapping laboratory reports for molecular genetic testing to the National Cancer Registration and Analysis Service (NCRAS)

Contact Name Josephine French
Contact Email josephine.french@phe.gov.uk
Company Name Health Data Insight
Address 5 St Philip's Place, Birmingham
Period of the Project 2-3 months between June and September.
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest Midnight on Friday 16th February 2018.
Brief Description of the Project

Background: Many tumours undergo molecular genetic and cytogenetic testing in NHS specialist laboratories in order to define which mutations or rearrangements underlie the malignant behaviour of the cells. These molecular tests are a key part of diagnosis and subtyping for many tumour types (e.g. brain tumours, sarcomas, paediatric cancers, and haematological malignancies), and some molecular aberrations can provide key information on the patient’s likely prognosis. Furthermore, in an increasing number of tumours (e.g. lung, colorectal, melanoma, gastrointestinal stromal tumours), molecular tests are used to identify mutations or rearrangements which can predict a clinical response to targeted therapies. This enables treatment to be personalised to the patient and to the specific biology of their tumour.

The Project: The National Cancer Registration and Analysis Service (NCRAS) is collecting molecular data from tumour testing directly from pathology and regional molecular genetics laboratories across England. Data from each laboratory arrives in a different format, with little consistency between laboratories, and different labs carry out different tests. Most of the labs performing these tests have been identified; however, not all are yet supplying data to NCRAS.

All source data from the labs needs to be mapped to a specific format, contained within three genetics tables in the NCRAS database. Data is processed by a combination of computational mapping and registration by hand; however computational mapping is the preferred route where possible.

The project is likely to involve a combination of creating mapping documents to show how source data can be mapped and transformed to the unified structure, and scripting code, into which these rules are embedded, using Yet Another Mark-up Language (YAML). There may also be an element of liaising with source laboratories to clarify any ambiguities within the data, and to improve the data quality where necessary.

Outputs: The exact outputs will be agreed with the intern at the outset of the project, and will depend upon overall progress with the genetics data programme and the specific skills and interests that the intern can contribute to the work. The project will make a tangible difference to ongoing work looking at equity of access to molecular tumour testing within the NHS, and will be of interest to CRUK, NHS providers and commissioners.

More information about the internship programme is available here.

Skills Required We would like an enthusiastic and creative intern, with an understanding of tumour genetics, and specific awareness of the different types of molecular and cytogenetic aberrations that underlie tumourigenesis. The candidate must demonstrate confidence and accuracy with interpreting and processing large datasets. The intern must have the ability to spot patterns and inconsistencies, and to pick out the scientific details of the data without losing the overall context of the clinical report.
Skills Desired Experience in bioinformatics and data mapping would be an advantage, as would familiarity with mark-up languages – specifically scripting rules based on Regular Expressions.

 

Leveraging the imaging power of the Beacon platform

Contact Name Maciej Hermanowicz
Contact Email maciej.x.hermanowicz@gsk.com
Company Name GlaxoSmithKline
Address GSK Medicines Research Centre Gunnels Wood Road Stevenage Herts SG1 2NY
Period of the Project 8-12 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3rd March
Brief Description of the Project The Beacon is a state-of-the art labware platform allowing for physical manipulation of single cells using illumination (https://www.youtube.com/watch?v=6gTGJhja0oI). As the successful applicant, you will seek out and apply novel image analysis and machine learning solutions to fully leverage the power of this platform. Together with the data scientists and biology experts within GSK, you will improve the process of generating antibody-producing cell lines. The initial stages on the project will be to tackle well-defined challenges, with some freedom in the choice and implementation of the solutions. Once this work is completed, you will be able to influence the direction of your remaining work.
Skills Required familiarity with a high level programming language interest in image analytics / computer vision understanding of basic machine learning techniques
Skills Desired familiarity with Python familiarity with image analysis / computer vision techniques machine learning experience

 

Atmospheric Structure Revealed by Refraction of Routine Radio Transmissions from Civil Aircraft.

Contact Name Malcolm Kitchen
Contact Email malcolm.kitchen@metoffice.gov.uk
Company Name Met Office
Address Met Office, FitzRoy Road, Exeter EX1 3PB
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 27 February
Brief Description of the Project A new method is envisaged for obtaining information on the structure of temperature and humidity in the atmosphere. Radio waves transmitted by aircraft suffer refraction (bending), especially when the aircraft is distant and close to the horizon. It should be possible to measure the angle of arrival (AoA) of the transmissions using an interferometer installed on a tower. Knowledge of the exact location of the aircraft then enables the bending due to the atmosphere to be calculated. Before any trial of the technique can go ahead, we need to model the data in to understand the sensitivity to changes in the atmospheric structure and the required accuracy for the AoA measurement. The initial data modelling would involve ray-tracing using synthetic or idealised data for aircraft locations and atmospheric structure.
Skills Required Some experience of programming is essential.
Skills Desired Knowledge of Python and Linux would be helpful. The ideal candidate would have an interest in and some previous experience in data modelling or remote sensing.

 

Simulating Electricity Prices: negative prices and auto-correlation

Contact Name Adrien Grange-Cabane
Contact Email To apply online: bp.com/uk/quantanalytics; For information contact adrien.grange-cabane@uk.bp.com or ISTQuantitativeAnalytics@bp.com
Company Name BP Supply and Trading - IST Quantitative Analytics
Address BP Plc, 20 Canada Square, London, E14 5NJ
Period of the Project 8 to 10 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March 2018
Brief Description of the Project BP Supply & Trading has a significant power trading business in North America and is seeking to build a similar commercial activity in continental Europe. Much of this new business will be customer focussed, with a strong emphasis on providing tailored commercial solutions to physical market participants, ranging from vanilla instruments to more complex structures, for example spark-spread options, swing contracts and virtual power plants. The generally non-storable nature of electricity imposes considerable valuation challenges from a derivatives pricing perspective. For example, the increase in renewables use lead to the apparition of negative spot prices. As the electricity production of renewable (wind or solar) is hard to predict, the supply on the grid is exceeding the demand by a large amount. Consequently, the settlement price of power becomes negative to incentivise non-renewable (e.g. gas or coal) generators to switch off. Price models such as the log-normal Black-Scholes model do not consider these stylised facts and need to be extended for commodity markets. Another feature of commodity markets is that the price dynamics of power is heavily dependent on the dynamics of that of fuels used for its production, especially gas. This dependence has recently been modelled in the context of co-integration which assumes the existence of a long term stationary process that drives both the power and gas prices. In the short term, the power and gas prices can diverge but they ultimately converge to their long-term equilibrium. This research project will explore recent modelling approach that have been put forward to model negative power prices and/or the combined dynamics of power and gas. Both are very active research topic for BP Supply & Trading. First, power derivatives will be significantly affected by the presence of negative prices. Second, traditional modelling of the joint dynamics of power and gas through a correlation coefficient leads to overestimating the value of a price generation assets. The use of co-integration is paramount to price correctly these. References [1] Benth, Fred & Koekebakker, Steen. (2015). Pricing of forwards and other Derivatives in cointegrated commodity markets. Energy Economics [2] Enzo Fanone, Andrea Gamba, Marcel Prokopczuk, The case of negative day-ahead electricity prices, Energy Economics, Volume 35, 2013, Pages 22-34
Skills Required Calculus, Probabilities, Problem Solving Skills, Communication Skills
Skills Desired Stochastic Calculus (optional), Financial Mathematics (optional), Programming Experience (optional)

 

Understanding and Estimating Physical Parameters in Electric Motors using Mathematical Modelling

Contact Name Geoff Walker
Contact Email geoff.walker@artesis.co.uk
Company Name Artesis LLP
Address geoff.walker@artesis.co.uk
Period of the Project 8 weeks, flexible start date
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 31 March
Brief Description of the Project Rotating electrical machinery such as pumps, fans, and compressors underpin a lot of our modern-day infrastructure; they are used in the water industry, throughout manufacturing, and in turbines/ for power generation. These devices are generally powered by three-phase induction motors, and the reliable performance of these motors (and the driven equipment) is critical in many sections of industry. Artesis is a small company that provides remote monitoring of this equipment, looking at only the voltage and current drawn by the motor to identify (mostly mechanical) faults in the motor and in the driven equipment. To do this, we use a black-box modelling algorithm which approximates the motor as a linear system – from a sample of voltage and current data, the algorithm extracts the best-fit linear relationship between voltage and current, and the leftover ‘residual’ current is then further analysed by examining its Fourier spectrum. The model consists of a number of parameters which do not themselves have any physical meaning. However, these parameters should correspond to physical parameters of the motor itself such as winding resistances, impedances, and slip (a parameter related to the output torque) and the first aim of this project is to extract this meaningful physical data from the parameters produced by the model. Time permitting, we would then like the student to speculate (both using some physical insight and a bit of guesswork) how faults on the equipment, as well as operational changes, might influence these physical parameters and therefore the parameters in the mathematical model; and some of these hypotheses can be tested using a test rig. This project is relatively open ended though it builds on work done in previous PMP project from 2016. There should be some time to visit either a real customer site or take at quick tour of the engineering department, and we hope this project will provide an opportunity for the student to see how mathematics interacts with the real world - and hopefully for them to develop a taste for solving problems that are important in industry.
Skills Required Mathematician! This project will not necessarily require particularly complex mathematics beyond an understanding of some linear algebra, differential equations and simple signal processing techniques such as the fast Fourier transform, along with some statistics, however it will probably require some inventive use of these techniques, combined with some insight into the physics involved in the problem. An understanding of linear algebra and statistics would be useful, as well as good communication skills.
Skills Desired Some software coding experience, or an enthusiasm to learn. Python or Scilab/ Matlab experience is preferred, and some experience with SQL would be useful. Any experience with state space identification algorithms such as MOESP or CVA would be very helpful, though for those interested there are plenty of useful resources: the Wikipedia page for “state space representation” is a good starting point. The main desired skill is a high level of inquisitiveness.

 

Algorithmic Investigation of Large Biological Data sets

Contact Name Lee Clewley
Contact Email lee.x.clewley@gsk.com
Company Name GlaxoSmithKline
Address Stockley Park West, 1-3 Ironbridge Road, Uxbridge, Middx, UB11 1BT, UK
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project Recently the rapid growth of both internal and external data sources in conjunction with large external databases has increased the need for GSK to address the most complex problems in drug discovery. For example, the chemical database ChEMBL, coupled with various biological databases internal and external to GSK with have meant that there is presently an enormous set potential set of research avenues that will yield biologically interesting insights. Such datasets provide a rich environment for deployment of algorithms such as Tensor flow, Deepchem or Topological Data Analysis depending on the form of the data. In this project, the student will explore and create several algorithms that will be applied to curated datasets to test a range of biological hypothesis. This project is relatively open-ended and so the student should be ready to explore and evaluate current academic work and applicable solutions. The student should be prepared to collaboratively suggest viable hypothesis based on the data at hand. The student should also be prepared, with aid from supervisors and contacts within the company, to demonstrate their findings in the form of visualizations, code-based models, or another appropriate medium.
Skills Required -knowledge of data science techniques and mathematical algorithms -experience reading, writing, using, and documenting code such as Python or R -the ability to work and communicate effectively within an inter-disciplinary project team -an effective research ethic and a passion to learn more
Skills Desired -knowledge and interest in machine learning techniques, deep learning -interest in the field of medicine development -significant coding experience with R/Python

 

New methods for genetic analysis

Contact Name Simon Thornber
Contact Email Simon.j.thornber@gsk.com
Company Name GlaxoSmithKline - R&D Tech
Address Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
Period of the Project Same as the other GSK proposals (over summer)
Project Open to Part III (master's) students
Deadline to Register Interest 3rd March. (Will coordinate presention with Nicola Richmond)
Brief Description of the Project The cost of DNA sequencing has now reached a level where it is economically viable for large scale sequencing programmes of thousands of volunteers to be undertaken. These data sets typically consist of patients genetic data, and a list of characteristics known as a ‘phenotype’. These data sets are then analysed using a technique known as a Genome wide association study (GWAS), to identify phenotypic characteristics that associate with genetics. With the richness of data available today, we would like to investigate alternative analysis approaches for these data sets. These approaches can take advantage the fact that multiple phenotypes may be linked (e.g. Body Mass Index, and resting heart rate), and that the data is on a scale rather than bucketed into “has phenotype” or “does not have phenotype”. This project would require investigation into other methods that have already been published, before proposing and designing new analysis approaches.
Skills Required Data skills
Skills Desired knowledge of Genetics

 

Optimising the definition of MR-based lung imaging biomarkers

Contact Name Juan Delgado
Contact Email juan.x.delgado@gsk.com
Company Name Bioimaging, GlaxoSmithKline
Address Gunnelswood road, Stevenage, SG1 2NY
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest March 23
Brief Description of the Project Introduction Magnetic Resonance Imaging (MRI) is of interest in pharmaceutical development for its translational potential from animal research into the clinical setting. MRI of the lung is a relatively novel approach for the assessment of the amount of cellularity and oedema (fluid) as indicators/biomarkers of pulmonary toxicity and/or efficacy. The quantification of these lesions is typically assessed by delineating regions-of-interest (ROIs) on the images and applying several filtering techniques to measure the final levels of these biomarkers. The selection of these ROIs (a process known as segmentation) can be achieved automatically using various computational approaches. However, the great variation in noise and image quality in MR data results in automated segmentation often being challenging. In fact, the greatest difficulty with automation is the diseased lung which appears closer in aspect to adjacent tissue (muscle/liver/heart) than to healthy lung. Therefore, manual segmentation remains the principal mechanism for analysis. Opportunity We have at our disposal a large historic animal MR dataset (>1000 tomographic images) on more than seven models of disease/toxicity of the lung, from various strains of rats and mice. Furthermore, although the images were acquired on a single preclinical MR scanner, these were made using various multiparametric MR sequences that result in different contrasts and signal-to-noise ratios. The values of six combinations of segmentations and filtering methods will be provided for testing. These methods include: manual, intensity thresholding, and multi-atlas segmentation; combined with: high-pass filtering for blood vessel and noise elimination, and holistic quantification. Animal species, strains, disease model, and MR parameters are available for use as covariates in the analysis. Aim & Impact of the Project In this project, we aim to understand the following questions: - to what extent can automated segmentation be used? - which methods are best suited for which covariates? - when is manual segmentation unavoidable? - what are the gaps in analysis that will require new segmentation methods? The understanding of these statements is crucial to improve the efficacy of MRI investigations, introduce common approaches that can be applied across multiple centres, and establish the precise criteria needed to define respiratory conditions such as COPD (Chronic Obstructive Pulmonary Disease), asthma, IPF (Idiopathic Pulmonary Fibrosis) or drug-induced lung injury (DILI) via MRI. A clearer understanding of the use of MRI in this context will help support improvements to the lives of millions of patients affected by respiratory disease. The student You will have a clear understanding of machine learning techniques and cloud point analysis of complex datasets. You should have excellent theoretical understanding of variational optimisation and Bayesian statistics, as well as hands-on experience with unsupervised machine learning techniques. Interest in pharmaceutical development, human and animal physiology, medical imaging, and/or computer vision is beneficial. If you join us, you will be directly contributing to impact the use of automated methods to extract decision-making biomarkers from image supporting medicinal development. Further, you will be able to liaise with a diverse range of scientists from in-vivo biologists to data scientists, and be exposed to cutting-edge science and equipment.
Skills Required Unsupervised machine learning, bayesian statistics, clustering
Skills Desired Computer vision, topological data analysis

 

Statistical analysis of biotherapeutic datasets to facilitate early ‘Critical Quality Attribute’ characterization.

Contact Name David Hilton
Contact Email david.w.hilton@gsk.com
Company Name Biopharm Process Research Group, GSK
Address Biopharm Process Research Building 5 Ground Floor GSK Gunnels Wood Road Stevenage SG1 2NY
Period of the Project 8 - 12 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 31st March 2018
Brief Description of the Project The Biopharm Process Research group is the first step on the route from newly discovered biotherapeutic drugs to a commercial product which can be administered to a patient. It is the group’s responsibility to screen candidate molecules for their developability, process fit and identify a suitable commercial cell line for their production. A key output from the group is the identification of Critical Quality Attributes (CQAs). These are chemical, physical or biological attributes of a molecule, or the system that produces it, which can be defined and monitored to ensure a product is within acceptable quality limits. This is an open-ended project in which the student is free to use novel strategies to analyse numerical and spectral datasets characterizing our molecules predicted attributes, and their intrinsic and in-process stability derived from techniques such as high performance chromatography, bio-layer interferometry and mass spectrometry. Using this analysis we plan to rapidly screen for CQAs and streamline our approach to CQA experimental characterization, whilst also uncovering uses for our datasets which hitherto remain unknown.
Skills Required • Good knowledge of statistics and statistical computing techniques • Creativity and capability to initiate and deliver research project • Adaptability to interpret large and diverse datasets • Ability to work independently and take ownership of project • Communicate with scientists and engineers from a wide range of academic backgrounds
Skills Desired • Familiarity with statistical computing software for instance JMP, Matlab, R, etc. • Familiarity with scripting languages for example Python, Perl, etc.

 

Optimising fresh produce quality monitoring and analysis

Contact Name Richard Boyle
Contact Email richard.boyle@mmflowers.co.uk
Company Name MM Flowers Ltd
Address Pierson Road, The enterprise Campus, Alconbury Weald, PE28 4YA
Period of the Project 8 weeks
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest  
Brief Description of the Project MM Flowers is the UK’s leading, integrated cut flower grower, importer and distributer, producing bouquets for the leading UK high-street retailers. MM has key growing bases across the world, and therefore imports a large number of species and cultivars of cut flowers. Cut flowers are subject to a wide range of environmental conditions within the supply chain, from local growers to those shipped from Kenya and Colombia. Temperature and time are two key factors in optimising quality of the post-harvest life, and the huge variation in cut flower species makes this particularly challenging to deliver a quality product for the end consumer. MM receives on average 600,000 stems of cut flowers daily, which increases dramatically during periods such as Valentine’s and Mother’s Day. To ensure the quality of product is delivered, a dedicated quality control team undertake daily inspections of the raw materials received, and in turn have generated a vast array of data over the last 10 years. The data recorded includes flower quality assessments, grower information and vase performance data. This data has been used historically to help identify the source of quality challenges originating from growing or the onward supply chain. To further improve the operational processes from farm to store, this data, including the methodology behind current sampling protocols, needs to be reviewed and analysed in greater detail to develop tools and techniques that can optimise MM practices. It is important to assess how current processes can be adapted and improved into daily quality inspection routines based on mathematical modelling, including validation of any models developed. Further to this, the student can expect to gain valuable experience working within a fast-paced business in the fresh produce sector. This includes liaising with different departments, project management, communication skills, and working towards the needs of the business.
Skills Required Strong computer skills Experience with statistics and modelling Clear communicator Self motivated Demonstrates initiative Project management
Skills Desired  

 

Mechanistic model development to characterise drug effects on platelets over time in pharmaceutical research.

Contact Name Shaun Flint, Chiara Zechin, Stefano Zamuner
Contact Email chiara.x.zecchin@gsk.com
Company Name GSK Experimental Medicine Unit / GSK Clinical Pharmacology Modelling and Simulation
Address Experimental Medicine Unit, R&D Immunoinflammation TA, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Herts SG1 2NY
Period of the Project 8 weeks (exact dates TBD)
Project Open to Experience in programming with R or MatLab
Deadline to Register Interest Part III (master's) students, PhD Students
Brief Description of the Project Pharmacokinetic pharmacodynamic (PKPD) modelling is a powerful approach used extensively in pre-clinical and clinical drug development to quantitatively characterise drug candidates, aid go/no-go decisions, inform future clinical study design and determine optimal dosing regimens that maximise treatment benefits and minimise side effects. Mechanistic models such as Physiologically-Based Pharmacokinetic (PBPK) models are increasingly utilised to allow a more detailed description of the relevant physiological processes [1]. The aim of this project is to use mechanistic models to fully characterise the time-course of platelet count changes after drug administration and, in particular, to predict the onset and degree of thrombocytopenia (low platelet count). These are important questions for drug development because significant haematologic side effects, including thrombocytopaenia, may result in serious clinical consequences, such as bleeding or reduced efficacy as a result of delayed or missed doses. A key component of designing a dosing regimen that minimises these concerns is the ability to accurately model drug effects on platelet counts (and other haematologic parameters) over time. Semi-mechanistic models able to mimic maturation and circulation of platelets and other haematologic cell lineages incorporating drug effects are available in the literature (e.g. [2-3]). These models use empiric approaches to describe relevant processes and might be not be suitable to extrapolate drug effects to new dosing regimens (e.g. extrapolation of a model based on single-dose data to a repeat-dose regimen). This project would explore the fundamental properties of, and differences between, semi-mechanistic models and mechanistic models such as the one proposed in [4]. It will include a theoretical exploration of model performance using simulations and access to a human clinical study dataset to allow the model to be assessed against real-life data. References: [1] Zhao P, Zhang L, Grillo JA, et al. Applications of physiologically based pharmacokinetic (PBPK) modeling and simulation during regulatory review. Clinical Pharmacology & Therapeutics. 2011;89(2):259-67. [2] Wang Y.M., Krzyzanski W., Doshi S., et al. Pharmacodynamics-mediated drug disposition (PDMDD) and precursor pool lifespan model for single dose of romiplostim in healthy subjects. The AAPS journal 2010; 12(4): 729-740. [3] Bender B.C., Schaedeli-Stark F., Koch R., et al. A population pharmacokinetic/pharmacodynamic model of thrombocytopenia characterizing the effect of trastuzumab emtansine (T-DM1) on platelet counts in patients with HER2-positive metastatic breast cancer. Cancer chemoth pharm 2012; 70(4): 591-601. [4] Harker L.A., Roskos L.K., Marzec U.M., et al. Effects of megakaryocyte growth and development factor on platelet production, platelet life span, and platelet function in healthy human volunteers. Blood. 2000;95(8):2514-22.
Skills Required Ordinary Differential Equations
Skills Desired Estimation methods in model development.

 

Enhanced Decision Making in Drug Discovery

Contact Name Stephen Ashman
Contact Email stephen.ashman@gsk.com
Company Name GlaxoSmithKline
Address GlaxoSmithKline Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY
Period of the Project 8 weeks
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 7 February
Brief Description of the Project This project will explore the way that scientists make decisions at key points during the Antibody drug discovery process. Program teams generate many different types of data during the drug discovery process: structure/amino acid sequence of the molecule, affinity for target, functional potency, mode of binding and numerous parameters describing the stability and manufacturability of the molecule. Each team needs to make choices about which molecules to progress on the basis of their experience of which attributes predict clinical success. Whilst experienced scientists make these decisions they are likely to be subject to a range of cognitive biases and challenged by the need to weight parameters appropriately and take account of the characteristic variance of each data type. The goal of the project would be to combine best practise in human decision-making heuristics with any notable new findings in an accessible tool that applies this standard consistently whilst providing teams with powerful visualisations describing the data supporting these decisions and recording the criteria and evidence for them.
Skills Required An interest in the application of data science to drug discovery and strong data visualisation skills.
Skills Desired Interest in decision theory and ability to code in R or Python.

 

Propagation of Very Low Frequency Emissions from Lightning

Contact Name Ed Stone
Contact Email ed.stone@metoffice.gov.uk
Company Name Met Office
Address Met Office, FitzRoy Road, Exeter EX1 3PB
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students
Deadline to Register Interest 01/02/27
Brief Description of the Project Lightning strikes generate very low frequency pulse of electromagnetic radiation that propagate around the world with the earth’s surface and the ionosphere acting as a waveguide. The Met Office has a global network or receivers to detect these pulses and calculate the location of the strike. We are currently part way through project to replace this network, and need to model the pulse propagation in order to optimise the system design and improve the location accuracy. We have models that solve Maxwell’s equations using analytic and finite-difference methods to describe different aspects of the physical system. The project is to further develop these models to study the sensitivity of the propagation to different physical conditions.
Skills Required Some experience of programming is essential.
Skills Desired Knowledge of FORTRAN and Linux would be very helpful.The ideal candidate would have an interest in and some previous experience in data modelling or remote sensing.

 

Real Time Tomography X-Ray Imaging System - Geometry Calibration by Optimisation

Contact Name Dan Oberg
Contact Email DOberg@RapiscanSystems.com
Company Name Rapiscan Systems Ltd
Address X-Ray House, Bonehurst Road, Salfords, Surrey, RH1 5GG, UK
Period of the Project Late June to Sept should work. Somewhat flexible.
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 3 Mar
Brief Description of the Project The problem to be solved in this project is to devise and test an improved method for measuring source and detector positions in an RTT system. The RTT is a fixed gantry x-ray computed tomography scanner. The current method for geometry calibration has certain limitations, principally that it can only derive variations in the XY plane (Z direction is the direction of travel through the machine) and the test object needs to be scanned multiple times. The idea is for the student to design a test piece that when scanned and analysed allows for accurate source and detector position estimates, and then to prove that the approach works by simulation.
Skills Required Optimisation, Computer Simulation skills in Matlab or similar, Image analysis
Skills Desired Computed tomography, Inverse problems

 

Near-term quantum algorithms

Contact Name Stephen Brierley
Contact Email steve.brierley@damtp.cam.ac.uk, team@riverlane.io
Company Name River Lane Research
Address ideaSpace West, 3 Charles Babbage Road, Cambridge, CB3 0GT, UK
Period of the Project between 1 and 3 months between late June and 30 September
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 6 April 2018
Brief Description of the Project Can be both open-ended or well-defined (depending on student interest). Open-ended version: you will read and understand recent papers on near-term quantum algorithms (choice according to your interest), implement a simulation into code and hopefully suggest improvements to the algorithm. Well-defined version: you will focus on active machine learning variational quantum algorithms to solve chemistry problems (i.e. finding ground state energies of compounds like H2O, O2 etc.). An even more well-defined version: you will optimise existing company code of standard quantum algorithms (such as Shor's, Grover's, eigensolver) or a particular algorithm recently invented by the company.
Skills Required Ability/interest in understanding academic papers. Ability to code (as indicated by e.g. completion of CATAM projects). Ability to communicate well and work in a team.
Skills Desired Ideally have experience using Python. Ideally 2.1 or above and have basic understanding of linear algebra and quantum postulates. Interest in machine learning or chemistry may be a significant plus.