skip to content

Summer Research Programmes

 

Industrial CMP project proposals from summer 2023

 

Geodesics in latent spaces

Project Title Geodesics in latent spaces
Keywords deep learning, generative models, differential geometry
Project listed 5 January 2023
Project status Closed
Contact Name Emma Slade
Contact Email emma.x.slade@gsk.com
Company/Lab/Department GSK, Artificial intelligence and Machine learning
Address The Stanley Building, 7 Pancras Square, London, N1C 4AG
Period of the Project 8 weeks
Project Open to Master's (Part III) students
Background Information Latent spaces in generative deep learning models embed information about the training data into low-dimensional spaces. General approaches to determining the similarity between these latent representations of the data assume/impose a Euclidean manifold onto the space and compute standard n-dimensional distance metrics under this assumption. There is no reason, however, to generally assume such latent spaces are best represented (or, indeed, accurately represented) by a Euclidean manifold, and therefore, similarity metrics, used to compute loss functions, or align data points together, may be inaccurate. Therefore, it is possible that learnt representations of the data are invalid despite following the standard methods in academic literature.
Brief Description of the Project This project aims to generalise the latent space to Riemannian metrics, in order to compute geodesics between points, which may allow for much more accurate computations of shortest-paths between data points. In order to extend the latent manifold to a Riemannian one, we focus on variational autoencoders (VAEs), as, due to their use of a Gaussian prior in the generative process, we are able to properly capture the structure of the data manifold under reasonable assumptions. Based on previous work on the topic, the project will begin by implementing basic VAEs in PyTorch, and then adapting the architecture to generalise the latent space of the VAE to Riemannian manifolds. By comparing results with and without the modified latent structure, we will aim to determine the improved interpretability and faithfulness of the generated data.
Work Environment The work location is flexible, with the opportunity to work in the GSK.ai London office. Flexible working hours.
References https://arxiv.org/pdf/2008.00565.pdf https://arxiv.org/pdf/1812.08284.pdf https://www.nature.com/articles/s41467-022-29443-w.pdf https://arxiv.org/pdf/1710.11379.pdf
Prerequisite Skills Geometry/Topology; Simulation; Programming, differential geometry
Other Skills Used in the Project Machine learning, PyTorch
Programming Languages Python

 

Simulation of Boson Sampling

Project Title Simulation of Boson Sampling
Keywords quantum computing, computational complexity theory, programming, python, matrix algebra
Project listed 6 January 2023
Project status Filled
Contact Name William Clements
Contact Email wclements@orcacomputing.com
Company/Lab/Department ORCA Computing
Address 30 Eastbourne Terrace, London W2 6LA
Period of the Project 8 weeks
Project Open to Undergraduates, Master's (Part III) students
Background Information ORCA Computing is a startup that builds quantum processors using photons. The simplest type of photonic quantum processor is a boson sampler, where several identical single photons (which are bosons) are sent into a complex random circuit where they interfere with each other, and a measurement is performed to determine where they exit the circuit. Since photons are quantum particles, the output of the circuit is described by a probability distribution over all possible outcomes, and each measurement yields a sample from this distribution. As the number of photons increases, simulating a boson sampler requires exponentially increasing classical (i.e. non-quantum) computational resources, and simulating only a few tens of photons is already intractable on the world's most powerful supercomputer. ORCA's research and development team works to improve our understanding of these boson samplers and methods by which they can be simulated at smaller scales.
Brief Description of the Project In this project, you will develop improved simulation algorithms for small-scale boson samplers. The interference between photons is described by a unitary matrix, and the probability of any given outcome of a boson sampler is determined by the permanent of a sub-matrix of this unitary matrix, where a permanent is a quantity that is related to the determinant but is much harder to calculate. Not only are permanents exponentially hard to calculate classically, but with more photons the number of possible outcomes also increases exponentially. Clifford and Clifford (see references) developed what is currently the most efficient algorithm for sampling from this distribution with a classical computer. You will study this algorithm, implement it in python, and benchmark it against ORCA's current simulation tools. This internship is a unique opportunity to explore the interface between mathematics, quantum physics, programming, and computational complexity theory within one of the UK's most exciting quantum computing startups.
Work Environment Flexible work - most team members spend at least 2 days per week in our office next to Paddington in London. You will be a member of the ORCA Machine Learning team.
References https://arxiv.org/abs/2005.04214, https://www.nature.com/articles/s41586-022-04725-x, https://www.science.org/doi/10.1126/science.abe8770,
Prerequisite Skills Mathematical physics
Other Skills Used in the Project Simulation, Computational complexity theory, quantum computing
Programming Languages Python

 

Predicting different conformational states of proteins.

Project Title Predicting different conformational states of proteins.
Keywords Protein folding, geometric deep learning, GNNs, transformer, Machine Learning
Project listed 9 January 2023
Project status Closed.  Application deadline: 31 March 2023
Contact Name Talip Ucar
Contact Email talip.ucar@astrazeneca.com
Company/Lab/Department AstraZeneca
Address talip.ucar@astrazeneca.com
Period of the Project 8 weeks
Project Open to Master's (Part III) students
Background Information Many proteins sample multiple 3D conformations in their natural state, and ligand binding can induce both local structural changes but also alter the relative energetics and hence populations of conformational states. These changes happen in different time scales. Simulating these dynamics in long-time scale is computationally very expensive, or not feasible. Also, acquiring crystal structures of different conformations experimentally is difficult, time consuming and costly. However, knowledge of the conformational landscape is important for structure-based drug design as different conformers may suggest different chemotypes with potentially different modes of action. In this project, we will explore different modelling approaches to predict different conformational states of proteins.
Brief Description of the Project The project is open-ended. Student should have a strong background in deep learning, and be familiar with protein folding. Student will develop models for prediction of conformational states of proteins. We have a small dataset to do initial experiments and try out different approaches. If these initial experiments are successful, we will move onto bigger experiments in the future. Ideally, we want to turn this work into a paper and publish at a top AI conference.
Work Environment In a team
References https://www.biorxiv.org/content/10.1101/2022.10.17.512570v1.full
Prerequisite Skills Statistics, Probability/Markov Chains, Predictive Modelling, Data Visualization, Deep Learning, Transformer, Graph Neural Network
Other Skills Used in the Project Data Visualization
Programming Languages Python, PyTorch

 

Evaluating differences visual perception & potential biases using subjective data sets: a case study with cut flowers

Project Title Evaluating differences visual perception & potential biases using subjective data sets: a case study with cut flowers
Keywords Horticulture
Project listed 10 January 2023
Project status Filled
Contact Name Richard Boyle
Contact Email richard.boyle@apexhorticulture.com
Company/Lab/Department APEX Horticulture Ltd
Address APEX Horticulture Ltd, Pierson Road, The Enterprise Campus, Alconbury Weald, PE284YA
Period of the Project 8 weeks
Project Open to Undergraduates; Master's (Part III) students
Background Information APEX Horticulture Ltd. is a professional research and development business, offering bespoke services for cut flowers and plants. APEX is based in a purpose-built testing centre, situated in Alconbury, Cambridgeshire (UK), adjacent to MM Flowers, one of the UK's leading cut flower importer/processing companies. APEX is at the optimal position in the chain, able to deliver high quality, independent research and close-to-market proximity matched with the invaluable insight into the true performance of flowers and plants subjected to actual supply chain conditions. The infrastructure and specialised personnel of APEX aims to deliver robust, standardised and consistent research every week of the year, together with the ability to undertake large scale projects to match all client requirements, influencing all elements of the cut flower supply chain. MM Flowers is an integrated cut flower supplier, with a unique ownership model and innovative practices. MM Flowers is owned by the Munoz Group, a leading breeder, grower and distributor of citrus and grapes; Vegpro, East Africa's largest flower and vegetable producer; and Elite, the leading flower grower and breeder in South America. MM Flowers supplies many of the major high street retail brands including Tesco and Marks & Spencer's, whether in store or directly to consumers. MM Flowers receives 100s million stems of cut flowers annually. The UK industry can be particularly challenging, where consumers expect high quality flowers at competitive prices, made even more difficult given that most species utilized are highly perishable, short life products, which are transported from many different regions around the world. As such, MM employs a Quality Assurance (QA) team whose role is to carry out daily inspections on both the unpacked flowers and the finished bouquets to ensure these are of the correct quality; the latter of these has some commonality with the APEX data collection process.
Brief Description of the Project

Whilst the infrastructure and personnel described above aims to deliver the best research/outputs possible, all of the data collected is based on a daily, visual inspection of each sample handled by APEX/bouquet assessed by the MM Flowers QA team. Considering an APEX example, a typical sample would be a bunch of flowers, in which the assessor must consider the flowers, leaves and stems in their inspection, and use pre-determined criteria to describe the appearance of the sample on that given day. This criterion describes both individual flower stems that have 'failed' (e.g. due to the presence of disease), but also the general, aesthetic appearance of the flowers (e.g. colour change). APEX handles approximately 40-50k samples annually, with around 30-60 data points collected per sample, with the full dataset of what each assessor recorded available. Due to the scale of the samples handled, this requires a team of individuals who are responsible for assessing the flowers on a daily basis. Training is provided to all individuals with supplementary guides available, but the nature of visual inspections will always have a degree of subjectivity - this is particularly the case across 100s of species and varieties, and 1000s of samples assessed. A very similar approach is used by the MM Flowers QA team, albeit with more focus on ensuring each bouquet meets the specification(s) agreed with their customers.

Given the above problem, this therefore presents a number of questions to address:
1. Using historic data, is it possible to quantify the degree of variability between individuals when assessing samples of cut flowers/commercial finished bouquets?
2. If there is clear variability, is this focused on certain aspects of the flowers or criterion available, or are there biases towards flower types, treatments etc?
3. Can recommendations be made around the methodology used by APEX/the MM Flowers QA team to try and minimise the subjective effect?

Work Environment Hybrid on site/remote
References  
Prerequisite Skills  
Other Skills Used in the Project  
Programming Languages No Preference

 

Data Driven Decision Frameworks for Multimodal High Content Data

Project Title Data Driven Decision Frameworks for Multimodal High Content Data
Keywords Data science, AI/ML, statistics, transcriptomics, multiomics data, ensemble methods
Project listed 18 January 2023
Project status Filled
Contact Name Abdullah Athar
Contact Email abdullah.m.athar@gsk.com
Company/Lab/Department GSK
Address Gunnels Wood Road, Stevenage SG1 2NY
Period of the Project 8-10 weeks between late June and 30 September
Project Open to Undergraduates; Master's (Part III) students
Background Information

Integrating high content biological profiling techniques has the potential to enable drug discovery teams to characterise compounds based on their multivariate biological profile (phenotypic response), as opposed to just the chemical structure and univariate data. This would enhance our ability to find new medicines for patients by providing a broader vision to the biochemical response the drug elicits in a patient.

We have generated and processed multimodal data (metabolomics, proteomics, transcriptomics and imaging) for compounds against multiple protein targets to address important questions for characterising drug-candidates, such as identifying undesirable biological effects caused by these compounds. Current literature, computational techniques and best practices are limited for interfacing with this type and scale of data and opportunities exist for novel research, exploration and even publication.

Brief Description of the Project

We would like a student to join us over the summer to:
1. Investigate our existing data workflows and explore new methods of statistical analyses on large biological datasets, eventually formulating a decision tree for appropriate use of our suite of tools.
2. Benefit from our extensive readily available datasets to explore various applications of machine learning to high content profiling data (please refer to the GitHub repository on Phenonaut for a reflective example).
3. Contribute to our existing codebase for analysing and AI modelling biological data, helping drive business decisions on critical stages of the drug discovery pipeline (such as hit qualification).

Work Environment Hybrid (performance with choice); Collaborative team atmosphere
References Lee, Changhee, and Mihaela van der Schaar. "A variational information bottleneck approach to multi-omics data integration." International Conference on Artificial Intelligence and Statistics. PMLR, 2021.
Sanchez-Fernandez, Ana, et al. "Contrastive learning of image-and structure-based representations in drug discovery." ICLR2022 Machine Learning for Drug Discovery. 2022.
GitHub - CarragherLab/phenonaut: A tool for exploration of multiomic phenotypic space (https://github.com/CarragherLab/phenonaut)
Prerequisite Skills Statistics; Mathematical Analysis; Data Visualization
Other Skills Used in the Project Predictive Modelling; Database Queries
Programming Languages Python; R

 

Si / SiC crystal representation for Monte Carlo simulaton of Implantation

Project Title Si / SiC crystal representation for Monte Carlo simulaton of Implantation
Keywords Mathematical modelling, monte carlo simulation, mathematical physics
Project listed 18 January 2023
Project status Filled
Contact Name Artem Babayan
Contact Email artem.babayan@silvaco.com
Company/Lab/Department Silvaco Europe
Address Compass Point, PE275JL, St Ives
Period of the Project 8 weeks, anytime
Project Open to Undergraduates; Master's (Part III) students
Background Information  
Brief Description of the Project

Silvaco is the software engineering company developing the tools to assist in manufacturing of semiconductor devices. In UK office we mostly work on 'process simulation' side -- mathematical modelling of the processes used in manufacturing.

One of such processes is implantation -- bombardment of piece of (typically) Si with ions (dopants), to change the electrical properties of the target in specific areas. To predict the final ion distribution we use Monte Carlo simulation -- follow the path of large number of ions, as they fly through the structure. There are several effects involved, one of them -- ions 'bouncing' off the crystal grid atoms. We are currently reviewing the legacy code and are interested in mathematical representation of regular crystal grid. That is, for arbitrary point in space we need to know the distance to and position of the closest atom of the grid.

Your task would be to review the literature and to suggest and to implement the required algorithm for Si and/or for SiC crystal grid.

Work Environment The project assumes the high degree of independence. The development part is expected to be done in the office (in St Ives, near Cambridge).
References  
Prerequisite Skills  
Other Skills Used in the Project Statistics; Mathematical physics; Geometry/Topology; Simulation; Predictive Modelling
Programming Languages Python; MATLAB; C++

 

Mathematical models of thermal oxidation of silicon: analysis and verification

Project Title Mathematical models of thermal oxidation of silicon: analysis and verification
Keywords Mathematical Modelling, Numerical Analysis, Semiconductor technology
Project listed 18 January 2023
Project status Filled
Contact Name Vasily Suvorov
Contact Email vasily.suvorov@silvaco.com
Company/Lab/Department Silvaco Europe, TCAD
Address Compass Point, 1 Stocks Bridge Way, St Ives, Cambridgeshire, PE27 5JL
Period of the Project 8 weeks
Project Open to Undergraduates; Master's (Part III) students
Background Information The process simulation in semiconductor industry is a crucial tool to develop new technologies, as well as to maintain the existing processes. Thermal oxidation of silicon is a way to produce a thin layer of oxide on the surface of a wafer in the fabrication of microelectronic structures and devices. The project aims to analyse and verify the mathematical models of this process and to explore the effects of various modelling assumptions. The successful outcome of the project will become part of the company's commercial software.
Brief Description of the Project In 1965 Bruce Deal and Andrew Grove proposed an analytical model that satisfactorily describes the growth of an oxide layer on the plane surface of a silicon wafer [1]. Despite the successes of the model, it does not explain the effects of various ambient and intrinsic factors on the kinetics of oxidation. In this project, we aim to explore and verify various extensions of the Deal and Grove model that takes into account the effects of ambient pressure, the presence of HCL in the oxidant, the wafer orientation and the effect of the dopants' present in silcon. The project will use a combination of analytical and numerical techniques to explore and verify various existing models of oxidation found in literature. Silvaco's own software products - Athena and VictoryProcess will also be used as tools in this project. The prospective student is expected to program in one of the following languages: Matlab, C/C++ or Python.
Work Environment The student will be based in the Silvaco’s office in St Ives and will work independently under the guidance of the project leader and a team of colleagues.
References [1] B.E.Deal, A.S.Grove (1965), General relationship for the thermal oxidation of silicon, Journal of Applied Physics, Vol.36, N12, 3770-3778.
Prerequisite Skills Mathematical physics; Numerical Analysis; Mathematical Analysis; Simulation
Other Skills Used in the Project  
Programming Languages Python; MATLAB; C++

 

Investigating hidden frequencies in video

Project Title Investigating hidden frequencies in video
Keywords Homeland security, frequency analysis, data processing, image analysis
Project listed 20 January 2023
Project status Closed
Contact Name Georgie Foot
Contact Email georgie.foot@iconal.com
Company/Lab/Department Iconal Technology
Address St Johns Innovation Centre, Cowley Road, CB4 0WS
Period of the Project Minimum 8 weeks, with flexible start date. 37.5hr week, Monday to Friday.
Project Open to Undergraduates; Master's (Part III) students
Background Information

Iconal is a technology consultancy based in Cambridge, specialising in new and emerging technologies for homeland security applications. This will be our fifth year offering CMP placements, and we are looking for a summer student to join our small team and get involved in our research and development activities.

Our ideal candidate is self-motivated, innovative and enthusiastic about the practical application of maths to solve real-world problems. We are offering four projects this year and students should indicate their project(s) of choice in their application. Unfortunately, due to the nature of our work, we are only able to consider applications from UK nationals or individuals with long-standing residency in the UK.

We will accept applications until 1st March and organise interviews shortly after this date.

Brief Description of the Project Frame-to-frame video analysis has already successfully been implemented to extract features such as non-contact heartrate sensing, micro-movements in large structures (like bridges) or vibrations in machinery to identify failure modes. This project is an investigative study into applying this technique to a different security-related application. We expect the student to be involved in collecting video data in our lab and carrying out data processing, image and video analysis.
Work Environment The student will join us in our Cambridge office, working alongside our small team of 8 people. Our team is principally made up of scientists and engineers, all working on a range of interesting and diverse projects. The student will have a dedicated supervisor/mentor to help guide the project, answer any queries and review their work. Though their time will primarily be spent on the specific project, we would be keen to host a student who is also interested in getting involved with our other ongoing projects, such as helping out with data collection to train algorithms. We typically work a 37.5hr week, Monday to Friday. There is the opportunity to work flexibly, but we would strongly encourage students to work in our Cambridge office or lab (project-dependent) at least 4 days a week. Towards the end of the project, the student will present their work to the rest of the team.
References http://www.iconal.com
Prerequisite Skills Statistics; Image processing
Other Skills Used in the Project Mathematical physics; Data Visualization
Programming Languages Python; Python strongly preferred (as its our main one), but can consider other languages if relevant.

 

Electronic data capture system for trials

Project Title Electronic data capture system for trials
Keywords Homeland security, statistics, database queries, experimental design, app building
Project listed 20 January 2023
Project status Closed
Contact Name Georgie Foot
Contact Email georgie.foot@iconal.com
Company/Lab/Department Iconal Technology
Address St Johns Innovation Centre, Cowley Road, CB4 0WS
Period of the Project Minimum 8 weeks, with flexible start date. 37.5hr week, Monday to Friday.
Project Open to Undergraduates; Master's (Part III) students
Background Information

Iconal is a technology consultancy based in Cambridge, specialising in new and emerging technologies for homeland security applications. This will be our fifth year offering CMP placements, and we are looking for a summer student to join our small team and get involved in our research and development activities.

Our ideal candidate is self-motivated, innovative and enthusiastic about the practical application of maths to solve real-world problems. We are offering four projects this year and students should indicate their project(s) of choice in their application. Unfortunately, due to the nature of our work, we are only able to consider applications from UK nationals or individuals with long-standing residency in the UK.

We will accept applications until 1st March and organise interviews shortly after this date.

Brief Description of the Project Iconal frequently carries out laboratory tests and field trials of new and emerging technologies. We are interested in streamlining the data collection process to increase efficiency and reduce human error. This project could take multiple directions, but we envision a student will either investigate the ideal approach for our requirements, or implement a suggested approach to develop a prototype software that we can use for data collection in the field. The work may involve elements of statistics, database queries and app building.
Work Environment The student will join us in our Cambridge office, working alongside our small team of 8 people. Our team is principally made up of scientists and engineers, all working on a range of interesting and diverse projects. The student will have a dedicated supervisor/mentor to help guide the project, answer any queries and review their work. Though their time will primarily be spent on the specific project, we would be keen to host a student who is also interested in getting involved with our other ongoing projects, such as helping out with data collection to train algorithms. We typically work a 37.5hr week, Monday to Friday. There is the opportunity to work flexibly, but we would strongly encourage students to work in our Cambridge office or lab (project-dependent) at least 4 days a week. Towards the end of the project, the student will present their work to the rest of the team.
References http://www.iconal.com
Prerequisite Skills Statistics; Probability/Markov Chains; Database Queries
Other Skills Used in the Project App Building
Programming Languages Python; Python strongly preferred (as its our main one), but can consider other languages if relevant.

 

Visualisation of venue queuing processes and crowd flows

Project Title Visualisation of venue queuing processes and crowd flows
Keywords Homeland security, simulation, data visualisation, app building
Project listed 20 January 2023
Project status Closed
Contact Name Georgie Foot
Contact Email georgie.foot@iconal.com
Company/Lab/Department Iconal Technology
Address St Johns Innovation Centre, Cowley Road, CB4 0WS
Period of the Project Minimum 8 weeks, with flexible start date. 37.5hr week, Monday to Friday.
Project Open to Undergraduates; Master's (Part III) students
Background Information

Iconal is a technology consultancy based in Cambridge, specialising in new and emerging technologies for homeland security applications. This will be our fifth year offering CMP placements, and we are looking for a summer student to join our small team and get involved in our research and development activities.

Our ideal candidate is self-motivated, innovative and enthusiastic about the practical application of maths to solve real-world problems. We are offering four projects this year and students should indicate their project(s) of choice in their application. Unfortunately, due to the nature of our work, we are only able to consider applications from UK nationals or individuals with long-standing residency in the UK.

We will accept applications until 1st March and organise interviews shortly after this date.

Brief Description of the Project Iconal have been developing tools to model queues and processes such as ticket checks at venues such as stadiums. We are keen for a summer student to help develop a more sophisticated visualisation of the current tool, which may include features such as following people through their journey into the venue, and a ‘drag-and-drop’ tool to swap out elements of the process. This project will involve elements of data visualisation, simulation and app building.
Work Environment The student will join us in our Cambridge office, working alongside our small team of 8 people. Our team is principally made up of scientists and engineers, all working on a range of interesting and diverse projects. The student will have a dedicated supervisor/mentor to help guide the project, answer any queries and review their work. Though their time will primarily be spent on the specific project, we would be keen to host a student who is also interested in getting involved with our other ongoing projects, such as helping out with data collection to train algorithms. We typically work a 37.5hr week, Monday to Friday. There is the opportunity to work flexibly, but we would strongly encourage students to work in our Cambridge office or lab (project-dependent) at least 4 days a week. Towards the end of the project, the student will present their work to the rest of the team.
References http://www.iconal.com
Prerequisite Skills Statistics; Probability/Markov Chains; Simulation; Data Visualization
Other Skills Used in the Project App Building; Any knowledge of queueing theory is advantageous.
Programming Languages Python ;Python strongly preferred (as its our main one), but can consider other languages if relevant.

 

Image and bag contents generation using open-source deep learning models

Project Title Image and bag contents generation using open-source deep learning models
Keywords Homeland security, machine learning, statistics, predictive modelling, image processing
Project listed 20 January 2023
Project status Closed
Contact Name Georgie Foot
Contact Email georgie.foot@iconal.com
Company/Lab/Department Iconal Technology
Address St Johns Innovation Centre, Cowley Road, CB4 0WS
Period of the Project Minimum 8 weeks, with flexible start date. 37.5hr week, Monday to Friday.
Project Open to Undergraduates; Master's (Part III) students
Background Information

Iconal is a technology consultancy based in Cambridge, specialising in new and emerging technologies for homeland security applications. This will be our fifth year offering CMP placements, and we are looking for a summer student to join our small team and get involved in our research and development activities.

Our ideal candidate is self-motivated, innovative and enthusiastic about the practical application of maths to solve real-world problems. We are offering four projects this year and students should indicate their project(s) of choice in their application. Unfortunately, due to the nature of our work, we are only able to consider applications from UK nationals or individuals with long-standing residency in the UK.

We will accept applications until 1st March and organise interviews shortly after this date.

Brief Description of the Project Iconal are interested in investigating and utilising existing deep learning models for security applications. The project could take multiple directions, but the planned activity is for a summer student to use a tool like the open-source Stable Diffusion model to add synthetic computer-generated objects to existing X-ray images, such as adding a set of keys to an existing bag, and evaluate the realism of the output. We are also interested in the capability of deep learning models to understand the contextual links between the existing items in the bag and new items that are added – for example, a suitcase containing a ski jacket is unlikely to also contain a beach ball. This project is likely to involve elements of statistics, predictive modelling and image processing.
Work Environment The student will join us in our Cambridge office, working alongside our small team of 8 people. Our team is principally made up of scientists and engineers, all working on a range of interesting and diverse projects. The student will have a dedicated supervisor/mentor to help guide the project, answer any queries and review their work. Though their time will primarily be spent on the specific project, we would be keen to host a student who is also interested in getting involved with our other ongoing projects, such as helping out with data collection to train algorithms. We typically work a 37.5hr week, Monday to Friday. There is the opportunity to work flexibly, but we would strongly encourage students to work in our Cambridge office or lab (project-dependent) at least 4 days a week. Towards the end of the project, the student will present their work to the rest of the team.
References http://www.iconal.com
Prerequisite Skills Statistics; Probability/Markov Chains; Image processing
Other Skills Used in the Project Predictive Modelling; Data Visualization
Programming Languages Python; Python strongly preferred (as its our main one), but can consider other languages if relevant.

 

Power risk analysis

Project Title Power risk analysis
Keywords Renewable Energy, electricity price risk, climate change
Project listed 23 January 2023
Project status Filled
Contact Name Gerard Pieters
Contact Email g.pieters@tierraunderwriting.com
Company/Lab/Department Tierra Underwriting
Address Gable House, 239 Regents Park Road, London, United Kingdom, N3 3LF
Period of the Project 8 weeks, 30-40 hours per week
Project Open to Undergraduates; Master's (Part III) students
Background Information

To decarbonise the world’s power generation, the growth of Renewable Energy projects is expected to further accelerate across Europe and globally. Historically, wind and solar projects largely received stabilised revenues through government support, but in recent years they increasingly rely on receiving floating power prices or direct offtake from corporates and utilities via fixed price power purchase agreements. With multi-decade investment horizons, this introduces the risk of either volatile power risk or long term counterpart risk as key uncertainties for renewable energy investors.

As a dedicated climate focused credit risk insurer, we are considering some new products that may help manage these risks for renewable investors.

Brief Description of the Project

We are curious to understand whether historical prices in power markets can be explained mathematically, and whether that can be used to determine more and less likely future price scenarios. We’re also curious if that is the same for all power markets, or whether they statistically behave differently market (country) by market or if they follow similar distributions. So the first part is a research and data science project around power prices. The second part, is that we use credit quality models to determine probabilities of credit default for counterparts, derived from credit rating agencies, that may be improved upon. We would like to understand if there is available data and analysis on corporate payment behaviour around electricity bills, or whether our current approach is sound.

The ultimate request is around combining counterpart payment behaviour with scenarios on likely future electricity prices. This allows construction of a model/ tool to run future scenarios on both, and allow us to take a view on financial risks involved and whether this then translates into an insurable risk.

Work Environment Largely virtual, but plan to meet up c once every 2 weeks (or more if needed).
References  
Prerequisite Skills Statistics; Probability/Markov Chains; Mathematical Analysis
Other Skills Used in the Project  
Programming Languages No Preference

 

Identifying and Removing Spurious Features to Aid Neural Network Training on Biopsy Image Data

Project Title Identifying and Removing Spurious Features to Aid Neural Network Training on Biopsy Image Data
Keywords Machine Learning, Pathology, Biopsies, Generalizability, Data Leakage
Project listed 23 January 2023
Project status Filled
Contact Name Elizabeth Soilleux
Contact Email ejs17@cam.ac.uk
Company/Lab/Department Lyzeum Ltd.
Address Department of Pathology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ
Period of the Project 8 weeks
Project Open to Undergraduates; Master's (Part III) students
Background Information

In the world of pathology, the gold standard for diagnosing many pathological conditions is the procurement and analysis of a biopsy. A biopsy is a small sample of tissue from the patient, which is embedded in paraffin wax to make a tissue block. 3 − 4μm thick slices are then laid onto a transparent glass slide, and stained to make certain features easier to observe. These biopsies are analysed by pathologists in order to make a diagnosis.

Modern machine learning is currently revolutionizing the analysis of biopsies; machine learning models have been trained on whole slide images (WSI) of biopsies that have been annotated, by pathologists and scanned using a digital slide scanner at 0.25 − 0.50μm/px to perform many complex image processing tasks, such as locating diagnostically relevant tissue [2], segmenting and classifying different cell nuclei as distinct objects [3], and making diagnosis predictions [5], even with minimal, weak annotations [4].

One of the many challenges with training machine learning algorithms on WSI data is generalizability. For example, pathologists sometimes add pen markings, labels, and other spurious features on the slides of the biopsies as reminders of important clinical data or diagnostic observations. This can confound the training of the neural network; if a neural network can identify that red pen is often used to signify a specific diagnostic feature, then the neural network will learn that red pen is indicative of that diagnosis instead of analysing the actual tissue.

In this project, a separate algorithm will be developed that can identify these spurious features and either remove or replace them. This would remove bias and data leakage from the dataset, thus improving performance and generalizability of the neural network. This has the potential to remove one of the big barriers to rapid and accurate WSI classification, improving accuracy and generalizability of a diagnostic tool in a real-world setting.

Brief Description of the Project

The project would entail the following steps:

1. Use QuPath (an open-access software platform for bioimage analysis) [1] to manually segment spurious features from WSIs carefully selected by Ben Schreiber (a 4th year PhD student working on digital pathology). These WSIs can be used as training and/or validation data.

2. Write a program that randomly draws convincing spurious features on WSIs. This can be used to artificially increase the number of WSIs containing spurious features.

3. Manually add easily removable spurious features (e.g. pen marks) to the physical biopsy slides themselves and have them scanned.

4. Write an algorithm to either:
(a) Flag WSIs which contain spurious features
(b) Segment spurious features in WSIs and replace them with background

The algorithm can be based on traditional image-processing techniques (Gabor filters, Canny edge detection, entropy filters, etc.), on machine learning approaches (neural networks, random forest, K-means clustering, etc.), or a combination of the two. The student should feel free to select a method that they feel best suits the task at hand.

Work Environment While students can work remotely on this project, we would prefer them to work at the department at least 3 days a week. There is an office for them to work in that will be shared by a PhD student who can answer any questions the student might have.
References [1] Peter Bankhead, Maurice B. Loughrey, Jos ́e A. Fern ́andez, Yvonne Dombrowski, Darragh G. McArt, Philip D. Dunne, Stephen McQuaid, Ronan T. Gray, Liam J. Murray, Helen G. Coleman, Jacqueline A. James, Manuel Salto-Tellez, and Peter W. Hamilton. QuPath: Open source software for digital pathology image analysis. Scientific Reports, 7(1):16878, December 2017.
[2] J. Denholm, B. A. Schreiber, S. C. Evans, O. M. Crook, A. Sharma, J. L. Watson, H. Bancroft, G. Langman, J. D. Gilbey, C. B. Sch ̈onlieb, M. J. Arends, and E. J. Soilleux. Multiple-instance-learning-based detection of coeliac disease in histological whole-slide images. Journal of Pathology Informatics, 13:100151, January 2022.
[3] Simon Graham, Quoc Dang Vu, Shan E Ahmed Raza, Ayesha Azam, Yee Wah Tsang, Jin Tae Kwak, and Nasir Rajpoot. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58:101563, December 2019.
[4] Manuel Tran, Sophia J. Wagner, Melanie Boxberg, and Tingying Peng. S5cl: Unifying fully-supervised, self-supervised, and semi-supervised learning through hierarchical contrastive learning. In Linwei Wang, Qi Dou, P. Thomas Fletcher, Stefanie Speidel, and Shuo Li, editors, Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, Lecture Notes in Computer Science, pages 99–108. Springer Nature Switzerland, 2022.
[5] Mitko Veta, Yujing J. Heng, Nikolas Stathonikos, Babak Ehteshami Bejnordi, Francisco Beca, Thomas Wollmann, Karl Rohr, Manan A. Shah, Dayong Wang, Mikael Rousson, Martin Hedlund, David Tellez, Francesco Ciompi, Erwan Zerhouni, David Lanyi, Matheus Viana, Vassili Kovalev, Vitali Liauchuk, Hady Ahmady Phoulady, Talha Qaiser, Simon Graham, Nasir Rajpoot, Erik Sj ̈oblom, Jesper Molin, Kyunghyun Paeng, Sangheum Hwang, Sunggyun Park, Zhipeng Jia, Eric I-Chao Chang, Yan Xu, Andrew H. Beck, Paul J. van Diest, and Josien P. W. Pluim. Predicting breast tumor proliferation from whole-slide images: The TUPAC16 challenge. Medical Image Analysis, 54:111–121, May 2019.
Prerequisite Skills Statistics
Other Skills Used in the Project Predictive Modelling; Database Queries; Data Visualization; Image Processing
Programming Languages Python

 

Use of Bayesian methods in environmental PBK modelling

Project Title Use of Bayesian methods in environmental PBK modelling
Keywords Bayesian inference, Complex systems, Compartmental models, Uncertainty analysis
Project listed 23 January 2023
Project status Filled
Contact Name Patrik Engi
Contact Email patrik.engi@unilever.com
Company/Lab/Department SEAC
Address Colworth Science Park, Sharnbrook, Bedford MK44 1LQ
Period of the Project 8-12 weeks - full-time
Project Open to Undergraduates; Master's (Part III) students
Background Information

In a fast-moving consumer goods environment, it’s vital that safety assessments are conducted to ensure products are safe for humans and the environment either during the production or the use of products. These risk assessments historically have heavily relied on the use of in vivo animal testing to identify detrimental impacts of chemicals on organisms. which are neither ethical or efficient facing sustained societal pressure to remove the use of animals testing for safety purposes.

For more than 20 years, Unilever has been working towards developing Next Generation Risk Assessments (NGRAs) leveraging on advances in biology, genetics, computational sciences, mathematics and statistics to develop novel in silico and in vitro based methods, vulgo New Approach Methodologies (NAMs) which can be used to robustly support safety decision without testing on animals. [1, 2]

The current evolution in the risk assessment paradigm, moving away from in vivo tests towards mechanistic based information to support decision-making, creates new opportunities for the use of NAMS data as hazard descriptors. However, this also creates an information mismatch since Environmental Risk Assessment is anchored on the comparison between a Predicted No Effect Concentration (PNEC) which represents the maximum water concentration at which no hazardous effect is anticipated, with exposure, as defined by the Predicted Environmental Concentration (PEC). However, in vitro data reflect an effect concentration at the target site (i.e. internal concentration) to the organism. To re-align the two, it is fundamental to have the ability to convert the in vitro generated hazard Point of Departure (PoD) to the same level of the PEC i.e., from concentration at target site towards external concentration required to achieve PoD concentration internally, making the availability of robust quantitative in vitro to in vivo (qIVIVE) /kinetic mass-transfer models fundamental.

One solution to this problem makes use of advanced toxicokinetic models, anchored on chemical characteristics and biological processes to parameterize the inflow and outflow of chemicals between the organism and the water surrounding it. These are able to relate the external concentration to the concentration at the target site and vice versa. [3]

Brief Description of the Project

This project will fundamentally explore the use of advance mathematical methods to solve complex biologically relevant problems, with an immediate impact on chemical safety decision processes, in a multi-disciplinary way. Several toxicokinetic models exist for a variety of organisms, with their respective predictive performance being an area of open debate. Especially for the purpose of safety assessment, the measure of uncertainty is vital, therefore there is a need to evaluate the models’ performance against existing data together with a quantitative characterization of the associated uncertainty. There are several ways of characterizing uncertainties or errors in these models, which will be the primary topic of investigation for this project. One of the widely applied approaches for uncertainty analysis is Bayesian methods, which has already been applied in toxicological risk assessment for human health [1], however there is little historic use of this advance mathematical approaches within environmental safety which we will address in this project.[4]

Hence, this project provides an opportunity to develop the knowledge and capability towards applying advance mathematical approaches to solve complex biological problems via the main objectives below:
1. Exploring, evaluating, and summarizing the relevant literature surrounding the use of Bayesian methods in Physiology based kinetic (PBK) models of aquatic organisms.
2. Proposing a framework with some examples [5, 6] and proving an R implementation [6].
3. Generalisation of such Bayesian approach within models of increasing complexity. Where possible, implementation and evaluation of multi-compartmental models in this space along with uncertainty and sensitivity analysis will be considered.

It should be noted that these objectives are not set in stone, being its development up for reconsideration based on intermediate findings. Ultimately, results from this project may be incorporated and used in business case studies to drive regulatory change towards acceptance of these modelling approaches, supporting transition towards non-animal approaches within safety assessment.

Work Environment The student will work independently, within a team of mathematical modellers, computational scientist and environmental safety experts among others covering broad technical expertise and with large experience on the technical topic and student supervision.
References [1] J. Reynolds, S. Malcomber and A. White, “A Bayesian approach for inferring global points of departure from transcriptomics data,” Computational Toxicology, vol. 16, p. 100138, November 2020.
[2] T. E. Moxon, H. Li, M.-Y. Lee, P. Piechota, B. Nicol, J. Pickles, R. Pendlington, I. Sorrell and M. T. Baltazar, “Application of physiologically based kinetic (PBK) modelling in the next generation risk assessment of dermally applied consumer products,” Toxicology in Vitro, vol. 63, p. 104746, March 2020.
[3] D. Mackay and A. Fraser, “Bioaccumulation of persistent organic chemicals: mechanisms and models,” Environmental Pollution, vol. 110, p. 375–391, December 2000.
[4] S. Charles, O. Gestin, J. Bruset, D. Lamonica, V. Baudrot, A. Chaumot, O. Geffard, T. Lacoue-Labarthe and C. Lopes, “Generic Solving of Physiologically-based Kinetic Models in Support of Next Generation Risk Assessment Due to Chemicals,” Journal of Exploratory Research in Pharmacology, vol. 000, p. 000–000, September 2022.
[5] A. Ratier, C. Lopes, P. Labadie, H. Budzinski, N. Delorme, H. Quéau, L. Peluhet, O. Geffard and M. Babut, “A Bayesian framework for estimating parameters of a generic toxicokinetic model for the bioaccumulation of organic chemicals by benthic invertebrates: Proof of concept with PCB153 and two freshwater species,” Ecotoxicology and Environmental Safety, vol. 180, p. 33–42, September 2019.
[6] A. Ratier, C. Lopes, O. Geffard and M. Babut, “The added value of Bayesian inference for estimating biotransformation rates of organic contaminants in aquatic invertebrates,” Aquatic Toxicology, vol. 234, p. 105811, May 2021.
[7] A. Ratier, V. Baudrot, M. Kaag, A. Siberchicot, C. Lopes and S. Charles, “rbioacc: An R-package to analyze toxicokinetic data,” Ecotoxicology and Environmental Safety, vol. 242, p. 113875, September 2022.
Prerequisite Skills Statistics; Probability/Markov Chains; Mathematical physics
Other Skills Used in the Project Simulation; Predictive Modelling; Data Visualization; App Building; Bayesian statistics
Programming Languages R

 

Revealing the biological significance of graph embeddings

Project Title Revealing the biological significance of graph embeddings
Keywords biological data analysis, machine learning, deep learning, graph analysis, network embedding
Project listed 23 January 2023
Project status Filled
Contact Name Florian Klimm
Contact Email fnkl@novonordisk.com
Company/Lab/Department Novo Nordisk Research Centre Oxford
Address The Innovation Building, Roosevelt Dr, Headington, Oxford OX3 7FZ
Period of the Project 8 weeks. Full time.
Project Open to Undergraduates; Master's (Part III) students
Background Information Many complex systems can be represented as graphs; structuring data as entities and connections between them. In the molecular biology context, for example, this could represent proteins and their interactions or the coordination of the expression of genes. For visualisation and statistical analyses it can be fruitful to construct graph-embeddings; low-dimensional representations that (approximately) preserve a graph’s topology. There are a variety of algorithms available for producing such embeddings, but it is often not straightforward to identify whether a generated embedding is biologically meaningful.
Brief Description of the Project In this project, the student will explore how we can use external biological information to validate the embedding of biological graphs. You will construct embeddings of protein interaction networks and knowledge graphs with state-of-the-art embedding algorithms such as node2vec. You will then compare the embedding with annotations such as gene pathways and gene-ontology terms. Various approaches to identify the statistical significance of these embeddings are possible and include an exploration of the distribution of distances between pairs of nodes and compare these distances with nonparametric statistical tests (e.g., Wilcoxon rank-sum-test). Testing procedures such as this will enable benchmarking of embedding algorithms based on those that yield the most meaningful embeddings. For large graphs, subsampling procedures or parallelisation will be necessary to achieve adequate performance. Ideally, you will produce a software library that will be published and/or code made publicly available. The project can be extended in various directions, depending on your interest; including exploration of further data sets, the development of alternative testing procedures, and comparison with link prediction.
Work Environment The student will be based in the Computational Biology department at the Novo Nordisk Research Centre Oxford. The department consists of approximately twenty computational researchers and works in partnership with the University of Oxford. The usual work hours are 9am to 5pm and we have a hybrid set-up, allowing working in office and in home office. The student will be encouraged to work at least some days in the office in Oxford.
References Mark Newman. Networks. Oxford University Press, 2018. Aditya Grover and Jure Leskovec. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. Wenting Liu, Jianjun Liu, and Jagath C. Rajapakse. "Gene ontology enrichment improves performances of functional similarity of genes." Scientific Reports 8.1 (2018): 1-12.
Prerequisite Skills Statistics; Probability/Markov Chains
Other Skills Used in the Project Data Visualization
Programming Languages Python; R

 

Internship in quantum computing

Project Title Internship in quantum computing
Keywords quantum, computing, algorithms, software
Project listed 23 January 2023
Project status

Closed.  Application deadline: 17 February 2023

Contact Name Ophelia Crawford
Contact Email ophelia.crawford@riverlane.com
Company/Lab/Department Riverlane
Address St Andrew's House, 59 St Andrew's Street, Cambridge, CB2 3BZ
Period of the Project 10-12 weeks, full-time
Project Open to Master's (Part III) students
Background Information Riverlane’s mission is to make quantum computing useful far sooner than previously imaginable, starting an era of human progress as significant as the industrial and digital revolutions. Large and reliable quantum computers have the potential to turn fields like clean energy, drug design and aerospace upside down. We’re building the Operating System for these pioneering new machines, leading the world to tackle quantum computing’s defining challenge: error correction. We’re growing fast and making remarkable progress.
Brief Description of the Project

What you will do:
- Develop, devise and research algorithms and software to enhance Riverlane’s capabilities, contributing to one or more projects that are core to Riverlane’s goals
- Discuss ideas with colleagues and communicate work in the form of presentations and reports
- Develop an understanding of quantum computers and their industrial applications

For more information and to apply, please visit our website: https://www.riverlane.com/jobs/internships

Work Environment Our full-time summer internships are designed to enable current students in a technical field to translate their skills and expertise into an industrial setting. You will join us at our office in Cambridge, UK, for 10 to 12 weeks, where you will have the opportunity to work alongside our team of software and hardware engineers, mathematicians, quantum information theorists, computational chemists and physicists – all experts in their fields. Every intern will have a dedicated supervisor and will work on a project designed to make the best use of their background and skills whilst developing their knowledge of quantum computing. We will support all interns to try and produce a concrete output by the end of the internship such as a paper, product, or software tool.
References  
Prerequisite Skills

Requirements
- A current student studying for a master’s degree (including the final year of an integrated undergraduate and master’s degree) 
- Proven ability in computational and/or theoretical work
- Experience with at least one programming language
- Excellent critical thinking and problem-solving ability
- Strong communication skills, both written and verbal
- Ability to take initiative and to work well as part of a team
- An interest in quantum computing (extensive knowledge or experience is not required)

Other Skills Used in the Project  
Programming Languages  

 

Optimization and Contextualization of Combat Sports Head Impact Data

Project Title Optimization and Contextualization of Combat Sports Head Impact Data
Keywords Biomechanics, Machine Learning, Pose Estimation
Project listed 24 January 2023
Project status Closed
Contact Name Dr Megan Lowery
Contact Email megan.lowery@swa.one
Company/Lab/Department Sports & Wellbeing Analytics
Address 3 New Mill Court, Llys Felin Newydd, Swansea Enterprise Park, Swansea, SA7 9FG
Period of the Project Full time, 8-10 week project.
Project Open to Undergraduates; Master's (Part III) students
Background Information Sport and Wellbeing Analytics (SWA) is a data analytics and sports technology company; utilizing multiple streams of data input to improve welfare, performance and entertainment within sports. Instrumented mouthguards – embedded with tri-axial accelerometers, gyroscopes and magnetometers – can quantify head impact magnitudes within various sports, providing high quality individual impact data. There are two keys stages in developing valid and reliable impact data: development of a robust raw signal processing pipeline, and the sport specific contextualisation of identified impacts. The development of a valid signal processing pipeline is, in part, dependent upon the removal of noise from time series data whilst maintaining desirable frequency ranges. Once impact events are collected, they need to be contextualized in a three-step process to remove false positives, identify impact location and identify sport-specific variables such as ‘punch type’ within combat sports. At present, these impacts are video verified by sports science personnel, which carries a heavy resource cost.
Brief Description of the Project This proposal offers two separate projects: firstly, the identification of optimal signal filter conditions to remove noise and maintain true signal characteristics, based on spectral analysis of instrumented mouthguard data. Secondly, continuing the work of previous projects, the creation of automatic event classifier based on unique time-series accelerometer and gyroscope features within a large professional mixed martial arts (MMA) dataset.
Work Environment Remote with the possibility of travel for data collection, team meetings or lab visits.
References  
Prerequisite Skills Statistics; Image processing; Predictive Modelling; Programming - Python
Other Skills Used in the Project  
Programming Languages Python; MATLAB

 

Machine Learning for the recognition of anatomical and radiological features in veterinary computed tomography (CT scans)

Project Title Machine Learning for the recognition of anatomical and radiological features in veterinary computed tomography (CT scans)
Keywords AI, Veterinary, Diagnostic Imaging, computer vision
Project listed 26 January 2023
Project status No longer available
Contact Name Julien Labruyère
Contact Email Julien@vet-ct.com
Company/Lab/Department VetCT (vetct.com)
Address Broers Building, 21 JJ Thomson Avenue, CB3 0FA Cambridge
Period of the Project 8 weeks
Project Open to Undergraduates; Master's (Part III) students
Background Information

VetCT was established in 2009, headquartered in Cambridge (UK), with a mission to improve the quality and efficiency of veterinary care. VetCT owns a large high-quality research patient database of animal CT scans, radiographs, MRI, and is currently developing artificial intelligence (AI) tools to improve workflows and clinical diagnosis for veterinary patients.

The rapid development of deep learning techniques in recent years enables the increasing utilisation of computer-aided detection (CAD) tools in assisting radiologists with their workflows and clinical diagnosis. In contrast to human medicine, the use of CAD and AI with veterinary studies is minimal and much less explored with its power yet to be exploited.

Brief Description of the Project The aim of the project is firstly to develop an AI-based tool to automatically detect, and segment anatomical body areas included in an animal CT scans. Secondly to further develop the tool to allow triage of the clinical cases based on AI recognition of specific imaging features. This project will also involve curation for the clinical and imaging data along with application and development of new techniques for pattern recognition (deep learning), image synthesis and domain adaptation. The successful candidate will build on the work that was was already done on the same subject last summer by previous students from the Maths Department of Cambridge University and will be working alongside the PhD student the company is currently recruiting.
Work Environment The student will be integrated in the young and dynamic VetCT team, and will be very welcome to come on a daily basis in the office based in the Cambridge West Campus. Working from home is also possible.
References  
Prerequisite Skills Machine learning
Other Skills Used in the Project Statistics; Image processing; Mathematical Analysis; Predictive Modelling; Yolo, computer vision, diagnostic imaging
Programming Languages Python

 

Sai Bot AI Artist

Project Title Sai Bot AI Artist
Keywords Art, artificial intelligence, face recognition, machine learning, app development
Project listed 31 January 2023
Project status Filled
Contact Name Felix Barber
Contact Email felix@dazlus.com
Company/Lab/Department Anglo Scientific Ltd / Dazlus AG
Address C/O Henry Hyde-Thomson, Ledbury HR8 1RZ
Period of the Project 8 weeks June/September, more if desired
Project Open to Undergraduates; Master's (Part III) students
Background Information Henry Hyde-Thomson has spent twenty years as Chairman of Anglo-Scientific Ltd building businesses in Information Technology and Medtech. In 2019, he teamed up with Felix Barber, a former Boston Consulting Group Senior Partner, to found Dazlus Ltd, a start-up focused on leveraging AI and AR to develop innovative art and entertainment products on mobile devices. One of the first two projects for Dazlus, in partnership with artist Tobias Gutmann, is the Face-o-mat AI artist, Sai Bot. Sai Bot draws your portrait in the unique style of Tobias Gutmann. It looks at you and talks to you on one iPad, while you can see your portrait created, penstroke by penstroke, on a second iPad. Sai Bot held its first exhibition, with Tobias Gutmann at the Barbara Seiler Gallery in Zurich in April 2022. Sai Bot currently has solo exhibitions at the Art Foundation of the Mobiliar in Bern and at the Underdogs Gallery in Lisbon. Today, there are many AI Artists but Sai Bot is unique in offering a performance as well as a portrait, in working in a style which is far from photorealistic, and, most important of all, it produces better art. On this last point, we may, of course, as proud parents, be a little biased!
Brief Description of the Project Sai Bot relies on previous portraits drawn by Tobias Gutmann with accompanying photographs of the portrait sitters. Using this database, together with face recognition technology and machine learning, Sai Bot is learning to draw. Face recognition is widely used to identify people based on their, largely invariant, facial attributes - shape and positioning of eyes, nose, mouth, etc. But to draw a portrait, you need to capture other facial attributes - beard, glasses and especially hairstyle - that may change all the time. Sai Bot uses standard face recognition software to capture invariant facial attributes and has developed proprietary software to capture the facial attributes, so important for a portrait, which are subject to change. Sai Bot already does a good job of facial recognition. But we are constantly seeking to do better, for example in describing the shape, curliness and flow of the hair or the shape of glasses frames. Once Sai Bot has recognized a face, it then asks how our partner artist, Tobias Gutmann might draw the face. To do that it first, in the database, identifies similar faces that Tobias has drawn to see how Tobias drew them and then it creates a new portrait using this information. Here the challenge is to find an algorithm that learns from past portraits but adapts for the differences in the new face while staying true to Tobias style. To do that requires a mix of fairly sophisticated maths and practical trial and error. The exact definition of the project will depend on where we are in the development program by the summer but it will involve picking off particular challenges in advanced face recognition and/or steps that allow Sai Bot greater flexibility than currently in adapting drawings to facial attributes, then trying to crack some thorny problems and make Sai Bot an even better artist.
Work Environment The student will be working remotely. We are a small permanent team, 3 people plus the artist. We all have our own offices in the UK and Switzerland. We have also had temporary participants on the project from the Ukraine and the Philippines. We use Google Meet, Zoom and MS Teams to communicate. Occasionally we travel to meet each other and go to a Sai Bot gallery exhibition or out for a coffee or a meal. We have a video call development team meeting every Friday at 4pm which the student will attend. The student will work closely with one of us as supervisor and can expect to have one or two video progress calls with the supervisor weekly. If there is, over the summer, a Sai Bot exhibition running in some not too distant location, we will try to give the student an opportunity to travel to visit. The student can work any hours they want from anywhere they want provided they deliver.
References

Anglo Scientific:
https://www.angloscientific.com/

Dazlus
https://www.dazlus.com/

Sai Bot
https://www.mobiliar.ch/die-mobiliar/nachhaltigkeit-engagement/das-gesellschaftsengagement-der-mobiliar/kunst-und-kultur/ausstellungen-und-fuehrungen/kunstausstellung-tobias-gutmann-und-sai-bot
https://www.under-dogs.net/blogs/exhibitions/tobias-gutmann-solo-exhibition

Prerequisite Skills Statistics; Probability/Markov Chains; Mathematical physics; Numerical Analysis; PDEs; Mathematical Analysis; Predictive Modelling; Database Queries
Other Skills Used in the Project Image processing; Simulation; Predictive Modelling; Data Visualization; App Building
Programming Languages Python

 

Continual Self Supervised Learning on Multimodal Data

Project Title Continual Self Supervised Learning on Multimodal Data
Keywords Computer Vision, Self-Supervised Learning, Multi-Modal Data, Anomaly Detection
Project listed 17 February 2023
Project status Closed.  Application deadline: 31 March 2023
Contact Name Daan de Cloe
Contact Email daan@autofilltech.com
Company/Lab/Department AutoFill Technologies B.V.
Address Marineweg 1 - 2241 TX - The Netherlands
Period of the Project 8 weeks / 2 months in July / August with 2 times 1 week at our Head Office in Wassenaar.
Project Open to Undergraduates; Master's (Part III) students
Background Information At AutoFill, we have developed an automated object inspection system that automatically captures large, high quality multimodal scans from objects in only a few seconds. We use Computer Vision and Machine Learning to optimize the quality and efficiency of the data collected, as well as to process the data into valuable information for our customers. With our multi-sensor solution, we are able to fuse the data from different types of sensors and from different viewing angles. With our systems deployed at customer locations, and our own test setup at our AutoFill Research Lab, we continuously generate large representative datasets that are used for the development and training of new AI models and algorithms.
Brief Description of the Project With our automated vehicle inspection systems, located at customer locations in Europe, we collect thousands of datasets, containing images of vehicles, captured from multiple angles, using the RGB and polarization sensors. According to recent studies the polarization modality provides a very rich description of the abnormalities in very challenging conditions such as poor illumination and strong reflection (Blin et al.). We built our in-house data annotation team which ensures consistent high standard annotations. However, the process of data annotation in the world of AI applications needs a large amount of labour work. Your main contribution as a researcher is to investigate how continual and self-supervised learning methods in the domain of anomaly detection on multimodal data can reduce the cost of annotation while still performing as well as fully-supervised models. The literature review of self-supervised learning proved model distillation opened a new way of learning which provides a decent representation using labelled and unlabelled data.
Work Environment The student will primarily work remote (at home or University), with daily meetings to discuss progress and challenges with the rest of the AutoFill CV / ML Team. We will also facilitate 2 times a one week visit at our Head Office in Wassenaar. One at the beginning (week 1) to get started and one in the middle (week 5) to discuss the progress and align on the activities for the remaining 3 weeks.
References Blin, Rachel, et al. "A new multimodal RGB and polarimetric image dataset for road scenes analysis." CVPR 2020.
Koohpayegani, S. A., Tejankar, A., & Pirsiavash, H. (2020). Compress: Self-supervised learning by compressing representations. NeurIPS 2020.
Prerequisite Skills Mathematical physics; Image processing
Other Skills Used in the Project Algebra/Number Theory; Predictive Modelling; Data Visualization
Programming Languages Python; C++

 

Comparative Analysis of System Compute Efficiency using Application Benchmarking

Project Title Comparative Analysis of System Compute Efficiency using Application Benchmarking
Keywords High Performance Computing, Application-based Benchmarking, Comparative Time Series Analysis, System Compute Efficiency, Energy Efficiency
Project listed 7 March 2023
Project status Filled
Contact Name Dominic Friend
Contact Email dmf38@cam.ac.uk
Company/Lab/Department Cambridge Open Zettascale Lab, University Information Services
Address Roger Needham Building, 7 J J Thomson Avenu,e Cambridge, CB3 0RB
Period of the Project 8 weeks: 3rd July - 25th August
Project Open to Undergraduates; Master's (Part III) students
Background Information

Dr Lisu Su (Chair and CEO at AMD) recently delivered an opening presentation at ISSCC 2023 titled “Innovation For the Next Decade of Compute Efficiency”. In her talk she discussed current compute performance trends in High Performance Computing (HPC) and highlighted that our biggest challenge to sensible and sustainable improvements remains system compute efficiency (performance per watt), especially when considered against the goal of building Zettascale (1024 FLOPS) systems.

Her high level analysis relied heavily on data provided by a single measure of performance, floating point operations per second (FLOPS) as measured by the LINPACK benchmark. However, the HPC community regularly debates the topic of useful performance measures and LINPACK is generally considered overly simplistic, and at worst unhelpful, as it can be a poor indicator of performance in real scientific applications.

Given this disconnect between system and application performance, it is not unreasonable to consider that the system compute efficiency story from an application perspective may be different, leaving the door open for an evidence based analysis of compute efficiency between multiple generations of system hardware.

By the summer of 2023, the Cambridge Service for Data-Driven Discovery (CSD3), operated by the University’s Research Computing Services (RCS), will be home to four generations of Intel based HPC systems. It has been observed that with the integration of each new system performance is “improved” (FLOPS) but at the expense of ever expanded power and cooling envelopes. Therefore, this situation presents a unique opportunity to evaluate the inter-generational system compute efficiency, beyond FLOPS, using a standardised application-based benchmarking suite to frame the discussion in terms of real world scientific application performance.

Brief Description of the Project

The student will undertake a short literature review to gain a foothold in the broad discussion around energy efficiency in HPC and receive access, orientation and time to become familiar with the available systems.

This will lead into co-designing an experiment with the necessary telemetry collection which will be able to sufficiently describe and visualise the compute efficiency of each system such that an inter-generational comparison can be conducted. The student will have significant input and ownership at this point and will have the opportunity to develop the project in an interesting direction.

Expected outputs from this project are a short technical report with a target audience of members of the Cambridge Open Zettascale Lab including stakeholders from Intel and Dell EMC. Along with digital assets such as, but not limited to, the experimental output dataset, scripting and code to allow further independent analysis and reproducibility.

There is scope for extra-project activities such as touring data hall 1 at the West Cambridge Data Centre (WCDC) which houses all parts of CSD3 as well as engaging with our internal seminar series and knowledge sharing opportunities.

Work Environment The project will be conducted individually with regular supervision and access to specialists upon request. The work place is a flexible hybrid environment, office space is available 5 days a week at the Roger Needham Building on the West Cambridge site, but there is no expectation to attend everyday (unless this is the student's preference). Supervisions will take place both in person and remotely. The student will be expected to work a 37.5 hour week, but not strictly 7.5 hours each weekday, precise working arrangements will be agreed at the start of the internship.
References https://www.youtube.com/watch?v=DxAL7MGiWGs&t=2725s
https://ieeexplore.ieee.org/abstract/document/9826013
Prerequisite Skills Statistics; Basic Unix skills including scripting
Other Skills Used in the Project Database Queries; Data Visualization
Programming Languages Python; Bash

 

Video restoration using Computer Vision and Machine Learning Techniques

Project Title Video restoration using Computer Vision and Machine Learning Techniques
Keywords Machine Learning, Computer Vision, Video Restoration.
Project listed 14 March 2023
Project status Closed
Contact Name Michael Thomas Roberts
Contact Email mr808@cam.ac.uk
Company/Lab/Department Ryff Europe
Address Nine Hills Rd, Cambridge
Period of the Project 8-12 weeks
Project Open to Undergraduates; Master's (Part III) students
Background Information We are eager to restore some existing old classic video footage using CV and ML methodologies. You would be embedded in an active team working on this project.
Brief Description of the Project There are several milestones in the project and areas of research: image alignment, inpainting, denoising, feature mapping, super resolution. This is a high profile project and would be very demanding but very rewarding too for the right person.
Work Environment In an office with the rest of the AI/ML team
References  
Prerequisite Skills Image processing
Other Skills Used in the Project  
Programming Languages Python