# Summer Research in Mathematics

This is a list of CMP academic project proposals from summer 2020.

## Machine Learning for Sensor Transducer Conversion Routines

 Project Title Machine Learning for Sensor Transducer Conversion Routines Contact Name Phillip Stanley-Marbell Contact Email phillip.stanley-marbell@eng.cam.ac.uk Company/Lab/Department Department of Engineering Address 9 JJ Thomson Ave, Cambridge CB3 0FA Period of the Project 8 weeks Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 21 February 2020 Background Information All sensors require the conversion of raw transducer output (e.g., a voltage or impedance) into a scaled signal that is desired to be measured. This project will investigate an exciting new area of implementing these conversion routines using data-driven modeling / machine learning. Brief Description of the Project Sensors ultimately provide a means of measuring the presence or magnitude of some measurand, such as a temperature, pressure, chemical species concentration, and so on. In application use, a sensor transducer is often paired with additional signal conditioning circuits to convert a raw signal transducer output such as a voltage or impedance measurement into a desired concept such as gas concentration or pressure. This conversion typically requires calibration data for the sensor along with sophisticated analog- or digital-domain signal processing algorithms to translate the raw sensor transducer readings into a desired signal value. This project will investigate using machine learning to achieve more efficient sensor transducer conversion routines for gas sensors. As experimental testbed, the project will use the BME680 and two similar sensors, the AMS CCS811 and the SGX MICS6814. The project will study the potential computation and accuracy tradeoffs involved in replacing existing algorithmic conversion routines for these sensors, which are currently compute-expensive, with a function model learned from data. References [Bosch2019] Bosch Sensortec, BME680 Low power gas, pressure, temperature & humidity sensor [Herod2018] Kristen Karl Herod, Analyzing and Optimizing an Array of Low-Cost Gas Sensors for use in an Air Quality Measurement Device with Machine Learning [WWTSM2019]Y. Wang, S. Willis, V. Tsoutsouras, and P. Stanley-Marbell. 2019. Deriving Equations from Sensor Data Using Dimensional Function Synthesis. ACM Trans. Embedd. Comput. Syst. 1, 1 (September 2019), 22 pages. Prerequisite Skills Statistics, Predictive Modelling, Data Visualization Other Skills Used in the Project Numerical Analysis Programming Languages C++, Mathematica Work Environment You will work in a team and will have access to both the head of the research group as well as to at least one postdoc and one PhD student.

## Synthetic Sensors and Digital Sensor Substitution

 Project Title Synthetic Sensors and Digital Sensor Substitution Contact Name Phillip Stanley-Marbell Contact Email phillip.stanley-marbell@eng.cam.ac.uk Company/Lab/Department Department of Engineering Address 9 JJ Thomson Ave, Cambridge CB3 0FA Period of the Project 8 weeks Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 21 February 2020 Background Information When no sensors exist for directly transducing a phenomenon of interest into a signal for subsequent processing, it is plausible to use information about the units of measure of the phenomenon of interest to design sensors based on transduction of multiple sub-phenomena. Recent research results [WWTSM2019] on computationally analyzing specifications of the units of measure of sensor systems make it possible, for the first time, to systematically explore this idea of synthetic sensors and digital sensor substitution. Brief Description of the Project This project will build on recent research results [WWTSM2019] on exploiting the principle of dimensional homogeneity to learn functions that relate sensor signals in a multi-modal sensor system. The project will use a state-of-the-art multi-modal sensor system. In this project, you will: (1) Investigate extending dimensional function synthesis for sensor data with the concept of conditional proportionality relationships, based on identifying model parameters which are constant. (2) Evaluate first-principles methods for replacing energy-costly sensor types (e.g., gyroscopes) with energy-efficient sensor types (e.g., accelerometers) for two sensor-augmented mechanical systems which will be made available to the student. (3) Apply the technique of dimensional function synthesis for sensor data to two mechanical systems provided and investigate new methods for replacing energy-costly sensor types (e.g., gyroscopes) with energy-efficient sensor types (e.g., accelerometers) for the specific mechanical systems in question. (4) Investigate and evaluate the benefit of including information about prior statistics into the methods above and applying concepts from Bayesian statistics to obtain probabilistic synthetic sensor models. (5) Evaluate the accuracy, result uncertainty, performance, and power dissipation tradeoffs in performing sensor substitution using both first-principles and dimensional-function-synthesized approaches for the two sensor-augmented mechanical systems. References [WWTSM2019] Y. Wang, S. Willis, V. Tsoutsouras, and P. Stanley-Marbell. 2019. Deriving Equations from Sensor Data Using Dimensional Function Synthesis. ACM Trans. Embedd. Comput. Syst. 1, 1 (September 2019), 22 pages. Prerequisite Skills Statistics, Probability/Markov Chains, Predictive Modelling Other Skills Used in the Project Numerical Analysis Programming Languages C++, Mathematica Work Environment You will work in a team and will have access to both the head of the research group as well as to at least one postdoc and one PhD student.

## Sensor Access Scheduling Algorithms using Game Theory and Mechanism Design

 Project Title Sensor Access Scheduling Algorithms using Game Theory and Mechanism Design Contact Name Phillip Stanley-Marbell Contact Email phillip.stanley-marbell@eng.cam.ac.uk Company/Lab/Department Department of Engineering Address 9 JJ Thomson Ave, Cambridge CB3 0FA Period of the Project 8 weeks Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 21 February 2020 Background Information When sensors become ubiquitous in society, there is the possibility of an ecosystem of third-party applications just as exists today for smart phones on deployed sensors. The results of this project are a first step to making that a reality. To achieve this goal, there is an opportunity to build on several research results from game theory [M81, FD91], but in the presence of insufficient background a student may take other approaches. Brief Description of the Project Algorithms which compete for scarce energy, computing, and sensor access resources need to be able to satisfy their sensor data sampling and quality needs without depleting sensing energy resources. The quality of data from all modern integrated sensors depends on their operating configurations [SMR16]. These configurations include their operating voltage, sampling rate, sample precision, and dynamic range of their signal input interfaces. In practice, these configuration parameters also have a significant effect on the energy used by sensors for generating each sample. Because the power dissipated by sensors can equal or exceed the power used in optimized state-of-the-art microcontrollers, reducing sensor energy usage can have a significant impact on whole-system energy use in sensing platforms. The aim of this project is to investigate efficient sensor access schedules under fidelity requirements. The project will investigate new algorithms to schedule accesses to sensors given a set of sensor access jobs. Let a sensor access schedule be an ordering for carrying out sensing operations by activating sensors with given sensor parameter configurations. Let a constraint on the tail distribution of tolerable sample precisions, accuracies, erasures, and latencies of the nth sensor access be Pn, An, En, and Ln, respectively. To formulate the problem of sensor access scheduling when schedules must describe not just time of access, but also required sensor fidelity, let j = ( j1 , j2 , . . . , jk ) be the sequence of k sensor access events visible to a scheduler at a given point in time. We can then define a schedule for a set of k accesses across Ns sensors as a pair of functions (C(j, t), J(j, t)). C(j, t) is a function from the vector of sensor access jobs and time steps to the configurations of the Ns sensors. J(j, t) is a function from the vector of sensor access jobs and time steps to a sequence of vectors indicating the activation state of each of the Ns sensors: The kth element in the tth vector is 1 if sensor k is accessed by the jobs serviced at time t and is 0 otherwise. The challenge is to find the pair of functions (C(j, t), J(j, t)) for a given Pn, An, En, and Ln. In this project, you will: (1) investigate the fundamental challenges and algorithms for computing schedules incrementally with the arrival of each sensor access request (dynamic online scheduling). (2) Investigate algorithms to compute a schedule with complete knowledge of all sensor accesses that will comprise the schedule (static offline scheduling) (3) Evaluate the effectiveness of the static-offline and dynamic-online schedulers using real-world sensor access activity traces which will be provided to you. References [SMR16] P. Stanley-Marbell and M. Rinard. Lax: Driver interfaces for approximate sensor device access. In HotOS XV, 2015. [M81] R. B. Myerson. design. Mathematics of operations research, 6(1):58–73, 1981 [ FD91] D. Fudenberg and J. Tirole. Game Theory. MIT Press, 1991. Prerequisite Skills Algebra/Number Theory, Simulation, Game Theory and Mechanism Design Other Skills Used in the Project Programming Languages C++, Mathematica Work Environment You will work in a team and will have access to both the head of the research group as well as to at least one postdoc and one PhD student.

## Feature Extraction in Multi-Modal Sensor Data by Dimensional Function Synthesis

 Project Title Feature Extraction in Multi-Modal Sensor Data by Dimensional Function Synthesis Contact Name Phillip Stanley-Marbell Contact Email phillip.stanley-marbell@eng.cam.ac.uk Company/Lab/Department Department of Engineering Address 9 JJ Thomson Ave, Cambridge CB3 0FA Period of the Project 8 weeks Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 21 February 2020 Background Information Physical systems instrumented with sensors can generate large volumes of data. Unlike data sources such as speech or text however, data from sensors of physical phenomena must obey the laws of physics. This project investigates an exciting new research area applying this idea to traditional machine learning techniques to improve their training and inference performance for sensor data. Brief Description of the Project Dimensional function synthesis [WWTSM2019] is a new method to improve effectiveness of machine learning from sensor data, by performing dimensionality reduction on multi-modal sensor data to make subsequent machine learning steps more effective. It can improve the speed of training neural networks on sensor data by many orders of magnitude for some applications and can also improve the performance of inference from multi-modal sensor data many-fold. This project will investigate feasibility of an end-to-end integration of dimensional function synthesis on multi-modal sensor data with an open-source neural network accelerator [Marlann 2019] running on a miniature low-power field-programmable gate array (FPGA) [iCE40 2019]. The investigation will evaluate the circuit size, power dissipation, and performance of hardware implementations of circuits generated by dimensional function synthesis for two simple mechanical systems. The project will evaluate the circuit size, power dissipation, and performance of the Marlann neural network accelerator and will then investigate coupling the output of dimensional function synthesis to Marlann. In this project, you will: (1) Investigate the circuit size, power dissipation, and performance of hardware implementations of circuits generated by dimensional function synthesis for two sensor-augmented mechanical systems, on the iCE40 FPGA (using the provided dimensional function synthesis implementation). Investigate the circuit size, power dissipation, and performance of the Marlann neural network accelerator on the iCE40 FPGA. (2) Investigate alternative methods for efficient machine learning from sensor data. This could involve either a literature (meta-) survey or lab-based evaluation of techniques from the research literature for head-to-head comparisons. (3) Propose implementation approaches for integrating the dimensionality reduction of dimensional function synthesis with hardware implementations of neural networks. References [WWTSM2019] Y. Wang, S. Willis, V. Tsoutsouras, and P. Stanley-Marbell. 2019. Deriving Equations from Sensor Data Using Dimensional Function Synthesis. ACM Trans. Embedd. Comput. Syst. 1, 1 (September 2019), 22 pages. [Marlann2019] SymbioticEDA: MARLANN -- A simple FPGA Machine Learning Accelerator, https://github.com/SymbioticEDA/MARLANN [iCE40 2019] Lattice Semiconductor, iCE40 FPGA. Prerequisite Skills Statistics, Predictive Modelling Other Skills Used in the Project Numerical Analysis Programming Languages C++, Mathematica Work Environment You will work in a team and will have access to both the head of the research group as well as to at least one postdoc and one PhD student.

## Yang-Mills Equations in Lean

 Project Title Yang-Mills Equations in Lean Contact Name Dr. Anthony Bordg Contact Email apdb3@cam.ac.uk Company/Lab/Department Department of Computer Science and Technology, University of Cambridge Address 15 JJ Thomson Avenue, Cambridge CB3 0FD Period of the Project flexible Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest Background Information The purpose of the internship is to formalise a body of mathematics using the theorem prover Lean, a new and very expressive proof assistant gathering a growing and worldwide community. The formalisation of mathematics can identify errors and gaps and it provides a foundation for more advanced material to be formalised in the future. Dr. Anthony Bordg and the chosen student last year co-authored two articles during the project, one on the formalisation of quantum algorithms (using the proof assistant Isabelle/HOL) and another one on quantum game theory after errors were spotted in the literature and fixed, an outcome of their formalisation. Brief Description of the Project Thomas Hales in 2017 proposed in his Cambridge talk entitled Big Conjecture'' the formalization of the Millennium Problems of the Clay Mathematics Institute as a stimulating challenge for interactive theorem proving. Formalizing the statements of these problems is part of Hales's Formal Abstracts initiative. One of these problems is the Yang-Mills and Mass Gap problem. I propose to embark on the formalization of some of the mathematical prerequisites of Yang-Mills theory. The proof assistant Lean having a library for manifolds, it is certainly feasible to formalize in Lean the classical Yang-Mills equations, a generalization of Maxwell's equations, as a first milestone: d_D F = 0 * d_D * F = J The main ingredients of these equations are within reach: the exterior derivative d on differential forms, the Hodge star operator *, the curvature F, and the connection D. References - Gauge Fields, Knots and Gravity, by John Baez and Javier P Muniain - The webpage of the proof assistant Lean https://leanprover.github.io/ - The public Zulip chat for Lean Prerequisite Skills Mathematical physics, Geometry/Topology, Progamming Other Skills Used in the Project Programming with the theorem prover Lean (a dependent type theory) Programming Languages Work Environment The student will work closely with Dr. Anthony Bordg and in collaboration with Prof. Michael R. Douglas from Stony Brook University. We will offer a close and friendly mentoring to nurture the student's research skills with the objective to co-author a research article.

## Modelling carbon sequestration in trees through reducing the complexity of a cell-based approach

 Project Title Modelling carbon sequestration in trees through reducing the complexity of a cell-based approach Contact Name Dr Andrew D. Friend Contact Email adf10@cam.ac.uk Company/Lab/Department Department of Geography Address University of Cambridge Downing Place Cambridge CB2 3EN Period of the Project 8 weeks Project Open to Master's (Part III) students Initial Deadline to register interest February 21 Background Information Terrestrial ecosystems sequester about 30% of anthropogenic CO2 emissions each year. However, we do not fully understanding the mechanisms behind this sink, and so cannot predict how it will behave in the future. Models do not treat growth processes explicitly. We have developed a detailed process-based approach to modelling wood growth, the main global sink, but need to make it simpler, while retaining its rigour, for global applications. Brief Description of the Project We have recently completed a detailed model of wood formation, based on cellular differentiation. It seems to be able to simulate observed anatomy and the influence of climate very well. We would like to apply this in a global model, which we have also developed. However, to do so requires reducing its complexity from the treatment of individual cells to aggregated behaviour (the cells proliferate and flow through a series of differentiation zones, picking up environmental and internal signals as they do so). It seems this would be ideal for a mathematician to tackle! If successful (i.e. more efficient than the original model, but retaining its main behaviour), the resulting model has the potential to revolutionise our understanding of the global carbon cycle and hence better predict future climate change. A previous collaboration with a CMP student was very successful and resulted in a peer-reviewed publication (https://doi.org/10.3389/fpls.2017.00182). References Friend AD, Eckes-Shephard AH, Fonti P, Rademacher TT, Rathgeber C, Richardson AD, Turton RH. 2019. On the need to consider wood formation processes in global vegetation models and a suggested approach. Annals of Forest Science, 76:49. Hayat A, Hacket-Pain AJH, Pretzsch H, Rademacher TT, Friend AD. 2017. Modeling tree growth taking into account carbon source and sink limitations. Frontiers in Plant Science 8, 182. Friend AD, 22 others. 2014. Carbon residence time dominates uncertainty in terrestrial vegetation responses to future climate and atmospheric CO2. PNAS 111, 3280-3285. Prerequisite Skills PDEs Other Skills Used in the Project Simulation, Predictive Modelling Programming Languages No Preference Work Environment We have a small team with a post-doc and PhD student working on this project, and are strongly connected to an international community of researchers in this field. Some remote work would be possible, but typically we work co-located for a few hours sometime between 9 and 6pm.

## Modeling DNA function, evolution and sequencing

 Project Title Modeling DNA function, evolution and sequencing Contact Name Nicola De Maio Contact Email demaio@ebi.ac.uk Company/Lab/Department Goldman group, European Bioinformatics Institute (EMBL-EBI) Address EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK. Period of the Project No specific time frame is required Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 29 February 2020 Background Information DNA is key to understanding life and evolution. In our group we use probabilistic models, often Markov Chains, to study DNA and its evolution, and improve DNA sequencing (the reading of the letters of the DNA alphabet from physical samples). These models are useful for many applications, such as inferring phylogenies (evolutionary histories of species or pathogens), finding footprints of natural selection, and understanding the role of genes, among many others. We also work to improve how DNA information is collected, specifically developing models and algorithms to improve the efficiency of Nanopore (ONT) sequencing. Brief Description of the Project We can offer many possible paths for a project, depending on the student's specific interests and skills. Some of the projects focus mostly on mathematical aspects of models: can we improve current models of DNA change by better predicting patterns in DNA sequence evolution, without compromising their computational efficiency? Are these models statistically identifiable? These projects require at least a basic understanding of probability and in particular of Markov models; some could require advanced probability and statistics. Other projects are more computational, and require the student to think about how to improve certain algorithms and be able to code (preferentially in Python or C++), but still require the understanding and development of mathematical models. For example, in one project we want to improve the efficiency of certain sequencing technologies. A DNA sequencing machine reads the content of DNA samples, telling a computer the sequence of letters in the DNA of the sample. However, this information comes in small blocks, called reads, from random parts of the original DNA sequence. Our approach is to tell the machine which blocks are interesting and which are not, so that interesting blocks can be wholly read, while non-interesting blocks can be rejected, saving time and resources. The approach involves improving the computational efficiency of the current methods developed in our lab, and developing new methods based on alternative models of efficiency. A further project, again requiring both modeling and computational skills, involves the genetic code: the set of rules that determine how DNA information is interpreted by living organisms that contain it. We want to investigate if the genetic code evolved to be robust to certain mutational events, and if the genetic code could be improved in this respect. This project will require the mathematical formalization of the properties of the genetic code, and the computational skills to write code to explore, assess and optimize large numbers of alternative genetic codes. References Prerequisite Skills Probability/Markov Chains Other Skills Used in the Project Statistics, Probability/Markov Chains, PDE's, Simulation, Data Visualization Programming Languages Python, C++, Other languages are also allowed, but some project might required understanding of previous Python code written in our lab. Work Environment Students will be part of the group during the project, attending group meetings. They are expected to be in office during work hours, but working remotely is also allowed when needed. The project will be supervised by the group leader, Nick Goldman, on a weekly basis, and by the senior scientist in the group, Nicola De Maio, on a daily basis if needed.

## Dynamic models for zoonotic viruses in bats

 Project Title Dynamic models for zoonotic viruses in bats Contact Name Dr Olivier Restif Contact Email or226@cam.ac.uk Company/Lab/Department Department of Veterinary Medicine Address Madingley Road, West Cambridge, CB3 0ES Period of the Project Flexible Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest Background Information Bats are an important source of emerging infectious diseases globally, including some of the lethal viruses (e.g. rabies, Ebola, Nipah). My team is part of an international consortium studying the ecology of tropical fruit bats in Africa, Asia and Australia and their interactions with zoonotic viruses. This work is interdisciplinary and provides exciting opportunities for data-driven mathematical modelling. Brief Description of the Project There are several possible objectives depending on the skills and interests of the student. One particular area of research is the use of stochastic models to make statistical inference from complex empirical data relevant to infection dynamics in bats. References https://www.bat1health.org Prerequisite Skills Statistics, Probability/Markov Chains, Numerical Analysis Other Skills Used in the Project Programming Languages R Work Environment Part of a research team including postdocs and PhD students.

## Sampling in nonlinear spaces with applications to information engineering

 Project Title Sampling in nonlinear spaces with applications to information engineering Contact Name Dr Cyrus Mostajeran Contact Email csm54@cam.ac.uk Company/Lab/Department Engineering Department Address Engineering Dept, Trumpington St, Cambridge CB2 1PZ Period of the Project Any 8 week period before September Project Open to Master's (Part III) students Initial Deadline to register interest Background Information Recent years have witnessed growing interest in the applications of differential geometry to engineering and applied sciences and in particular statistics and optimisation on manifolds find applications across various branches of information engineering, including computer vision, machine learning, and medical imaging. Manifolds that naturally arise in these contexts include homogeneous spaces such as Grassmannians, Stiefel manifolds, and cones of symmetric positive definite (SPD) matrices. SPD matrices in particular appear in an enormous range of applications as covariance matrices, including brain-computer interface (BCI) systems, radar data processing, and diffusion tensor imaging (DTI). The generation of random points on manifolds is of interest within the context of this broader research landscape. A variety of inferential tasks require sampling from probability distributions on manifolds. Examples include sampling from the posterior distribution on constrained parameter spaces such as covariance matrices and data generation for algorithms in topological statistics. Brief Description of the Project The project aims to develop general methods for sampling from a given probability distribution on a manifold to provide theoretical guarantees for the performance of these methods, and to explore their practical implementation and concrete applications. Sampling from the distribution means generating samples (random points) that satisfy the law of large numbers (LLN), which is at the heart of any Monte Carlo method (MC). Monte Carlo methods are fundamental to stochastic optimisation, Bayesian inference, and many other computational paradigms which are of fundamental importance in information engineering. In order of increasing difficulty, the project may consider the following problems, (a) sampling on a compact homogeneous space from a uniform distribution, (b) sampling on a homogeneous space from an isotropy invariant distribution, (c) sampling on any suitable Riemannian manifold from a partially unknown distribution. Problem (a) has been considered extensively, in connection with random matrix theory. On the other hand, problem (b) is a fairly open challenge, at the heart of ongoing research in non-Euclidean machine learning. Compact Riemannian homogeneous manifolds include compact Lie groups, and Stiefel and Grassmann manifolds, which are of fundamental importance in robotics, telecommunications, and other fields. Among general (non-compact) homogeneous Riemannian manifolds, one finds hyperbolic spaces and various spaces of covariance matrices, which play a central role in brain-computer interface analysis (in addition to their great intrinsic mathematical interest). Problem (c) is also a cutting-edge problem, treated only in very few recent papers, which deal with Markov Chain Monte Carlo methods, in Riemannian manifolds. The issue of an unknown target distribution often arises in Bayesian inference, where the probability density is known only up to a normalising factor. The development of sampling strategies on manifolds equipped with a variety of different metric structures such as Finsler manifolds is another possible research direction for the project. The resulting algorithms can be tested on real problems and data with the aim of systematically identifying the most appropriate metric structures for use in analysis and statistics for specific applications. Special consideration will be given to the space of SPDs of a fixed dimension.  The project contains a wide choice of applied (ranging from building software toolboxes to proving theoretical performance guarantees) and pure problems (which concern the interplay of probability, geometry and harmonic analysis), and can therefore be adapted to the student's preference and motivation. References 1. Absil, P.-A. , Mahony, R. and Sepulchre, R.: Optimization algorithms on matrix manifolds. Princeton University Press (2008). 2. Meckes, E. S.: Random matrix theory of the classical compact groups. Cambridge University Press (2019). 3. Said, S., Hajri, H., Bombrun, L. and Vemuri, B. C.: Gaussian distributions on Riemannian symmetric spaces. IEEE Trans. Inf. Theory 64(2) (2018). 4. Said, S. : On the Riemannian barycentre of a Markov chain. (arxiv:1908.08912, pre-print) (2019). 5. Diaconis, P., Holmes, S., Shahshahani, M.: Sampling from a manifold. (arXiv:1206.6913) (2012). Prerequisite Skills Probability/Markov Chains, Geometry/Topology, Simulation, Data Visualization Other Skills Used in the Project Statistics, Probability/Markov Chains, Geometry/Topology, Simulation, Data Visualization Programming Languages Python, MATLAB, C++ Work Environment The student will have his or her own desk within the Control Group of the Engineering Department and will have access to shared facilities and be invited to attend group meetings and seminars. The group is quite large and people are encouraged to share ideas and discuss their work. The student will most likely have opportunities to regularly discuss their progress with a senior professor in the group as well as research colleagues in France. I will be available to meet on a daily basis as needed.

## Mathematical Analysis of Music

 Project Title Mathematical Analysis of Music Contact Name Francis Knights Contact Email fk240@cam.ac.uk Company/Lab/Department Fitzwilliam College Address Fitzwilliam College Period of the Project 4 weeks, July Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 21 February Background Information This forms part of an ongoing Cambridge research project https://formal-methods-in-musicology.webnode.com/ which uses mathematical and statistical analysis techniques to understand musical style and structure. Repertoire ranges from the Middle Ages to the Beatles, with an emphasis on classical music, and we have had a number of very successful interns from the Maths department in recent years. Several papers have been generated, and we are hoping to continue this. Brief Description of the Project Candidates should be able to read music, and have a good understanding of musical history and style. The projects this summer will include work using Markov Chains, Principal Component Analysis, Graph Theory and Relative Entropy. At least one of the projects will be looking at 18th century music, and we are also interested in attributions (how can we compare the style of anonymous compositions with those by named composers?) and stylistic development (why does Bach sound different from Beethoven?). References Prerequisite Skills Statistics, Probability/Markov Chains, Mathematical Analysis, Database Queries Other Skills Used in the Project Programming Languages MATLAB Work Environment Each student will be assigned a particular method and repertoire to work with, and there will be frequent meetings with the project directors, Francis Knights (Fitzwilliam College) and Prof Pablo Padilla (National University of Mexico), who will be visiting for July. If there are several students involved, we will also have twice-weekly full group meetings. The project is based at Fitzwilliam College.

## Motion analysis in synthetic embryos

 Project Title Motion analysis in synthetic embryos Contact Name Leila Muresan Contact Email lam94@cam.ac.uk Company/Lab/Department Dept. of Physiology, Development and Neuroscience /CAIC (University of Cambridge) Address Cambridge Advanced Imaging Center Downing site, CB2 3DY Cambridge Period of the Project 1st of June - 30 September Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 21st of February Background Information The mechanisms regulating mammalian embryo development after its implantation in the maternal uterus are poorly understood. The Zernicka-Goetz laboratory (PDN) has recently developed an approach to combine different kinds of stem cells in vitro to generate structures that mimic mammalian embryos ([1]). The aim of this project is to study cell migration and rearrangement in different parts of the synthetic embryo. Brief Description of the Project We hypothesize that the three-layer body plan of mammalian embryos lends itself to motion analysis on sphere-like surfaces as described in [2] (Matlab code available). Our goal is to adapt the approach to our experimental settings (membrane staining instead of nuclear staining, potentially large displacements, varying signal-to-noise ratio), analyse and interpret the results, and evaluate the performance of this approach in comparison to existing methods. References 1. Sozen B, Amadei G, Cox A, Wang R, Na E, Czukiewska S, Chappell L, Voet T, Michel G, Jing N, Glover DM, Zernicka-Goetz M (2018) Self-assembly of embryonic and two extraembryonic stem cell types into gastrulating embryo-like structures. Nature Cell Biology. 20: 979-989 2. LF Lang - A Numerical Framework for Efficient Motion Estimation on Evolving Sphere-Like Surfaces Based on Brightness and Mass Conservation Laws, SIAM Journal on Imaging Sciences 12 (1), 459-491 Prerequisite Skills Image processing Other Skills Used in the Project Image processing Programming Languages MATLAB Work Environment This is a multi-disciplinary project at the interface of image processing and biology. Dr Gianluca Amadei (PDN) will be the day to day biological data contact, Dr Leila Muresan will supervise the image analysis. Dr Lukas Lang (author of [2]) will give advice and feedback and Prof. Magda Zernicka-Goetz (PDN) will steer the project and integrate potential results in the workflow of her group.

## Greedy algorithms for sparse deconvolution with space varying kernel

 Project Title Greedy algorithms for sparse deconvolution with space varying kernel Contact Name Bogdan Toader Contact Email bt382@cam.ac.uk Company/Lab/Department Cambridge Advanced Imaging Centre Address Department of Physiology, Development and Neuroscience, Anatomy building, Downing site, Cambridge, CB2 3DY Period of the Project 8 weeks between late June and 30 September, details to be discussed with the student. Project Open to Master's (Part III) students Initial Deadline to register interest February 21 Background Information The problem of single molecule localisation in super-resolution microscopy can be modelled mathematically as a sparse deconvolution problem. Specifically, the problem concerns recovering locations and weights of point sources from noisy measurements of their convolution with a known blurring kernel. This is often done by solving the total variation (TV) norm minimisation problem, and there are a number of algorithms that have been developped for the case when the kernel is spatially invariant. However, there are instances in super-resolution microscopy when the kernel is spatially varying, which requires an extension of these algorithms, and this is the focus of this project. Brief Description of the Project In this project, we will consider the class of greedy algorithms for sparse deconvolution (see the references below for two examples) and we will investigate potential extensions to the case when the kernel is spatially varying. The student will be given a few ideas as a starting point, but is free to take other approaches to solve the problem. A successful outcome of the project would involve an extension to one of the existing algorithms that can localise molecules when applied to a data set from our lab. References [1] Nicholas Boyd, Geoffrey Schiebinger, and Benjamin Recht. The alternating descent conditional gradient method for sparse inverse problems. SIAM Journal on Optimization, 27(2):616–639, 2017. [ 2] Quentin Denoyelle, Vincent Duval, Gabriel Peyr ́e, and Emmanuel Soubies. The sliding Frank-Wolfe algorithm and its application to super-resolution microscopy. Inverse Problems, 36(1), 2019. Prerequisite Skills Numerical Analysis Other Skills Used in the Project Image processing Programming Languages Python, MATLAB, No Preference Work Environment The student will join the Cambridge Advanced Imaging Centre and they will be able to talk to the other members of the lab about various aspects of the problem. Working arrangements will be discussed with the student but in general they will have the freedom to work remotely or in the lab (or a combination), as long as reasonable contact is maintained.

## Algorithm for super-resolution microscopy

 Project Title Algorithm for super-resolution microscopy Contact Name Jerome Boulanger Contact Email jeromeb@mrc-lmb.cam.ac.uk Company/Lab/Department MRC Laboratory of Molecular Biology Address Cambridge Biomedical Campus Francis Crick Avenue Cambridge CB2 0QH, UK Period of the Project June to September Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 21st of February Background Information Observing living cell using fluorescence microscopy provide an incredible insight into the mechanism allowing them to perform their most diverse functions. However, instruments remains limited in term of resolution and the structure of the molecular assembly remains mostly inaccessible to modern optical microscopes. Various approaches have been developed to circumvent these problems. In this project, we would like to explore a computational/ mathematical approach to gain access to augmented resolution. Brief Description of the Project The aim is to address the problem of single emitter grouping in single molecule localization nanoscopy. Previous work based on RJMCMC show that sub-nanometer resolution are accessible. In this project, we would like to explore a different algorithmic approaches (statistical/variationnal/learning etc) to this problem. A first step would be to define the methodology, then a simulation would enable to assess the validity of the approach. Finally. the approach would be tested on data acquired in the laboratory. A positive outcome would comprise a better understanding of the problem and its modeling, an algorithmic solution and ideally some preliminary results on real data. References https://www.biorxiv.org/content/10.1101/752287v1 Prerequisite Skills Other Skills Used in the Project Programming Languages No Preference Work Environment The work environment is flexible (in the lab/at home). We are a small multi disciplinary group and it will possible to interact with other members.

## Discrete-continuum modelling of transport in tumours

 Project Title Discrete-continuum modelling of transport in tumours Contact Name Dr Paul Sweeney Contact Email paul.sweeney@cruk.cam.ac.uk Company/Lab/Department Cancer Research UK Cambridge Institute Address Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE Period of the Project 8 weeks Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest 21st of February Background Information Tumours consist of an interconnected network of blood vessels which deliver nutrients and drugs to the surrounding cancerous cells. Compared to healthy tissue, these structures display no hierarchical characteristics and, as tumour blood vessels grow rapidly, they are abnormal and leaky. The non-invasive measurement of tumour architecture and transport simulataneously across large, living samples (cm) is currently infeasible at the scale of the smallest blood vessels (< 3 μm in diameter). As a consequence, this limits our ability to predict the efficacy of a variety of anti-therapies in a preclinical or clinical setting. A combination of biomedical imaging and mathematical modelling could fill this void by accurately predicting fluid and mass transport with limited tumour architectural information. Brief Description of the Project The goal of this project is to build upon current discrete-continuum models (Sweeney 2018, Shipley et al. 2019) which utilise imaging data to predict fluid drug transport in vascularised tumours. These multiscale models incorporate vascular architectures, obtained via biomedical imaging, to predict transport through discrete blood vessel networks and coupled, via point sources of flux, to a porous medium model for non-leaky vessels (Shipley et al. 2010). The latter model uses homogenization methods to average fluid and drug transport across microvessels which cannot be resolved using non-invasive experiments. The aim of the project is to move towards incorporating the double-porous medium model of Shipley et al. (2010) into a discrete-continuum model to facilitate the prediction of fluid and drug transport in vascular tumours. The project can be tailored to the applicant’s particular strengths and interests, however it is hoped that the applicant will work on both the theoretical and computational aspects. References R. J. Shipley, A. F. Smith, P. W. Sweeney, A. R. Pries, and T. W. Secomb. 2019. A hybrid discrete-continuum approach for modelling microcirculatory blood flow. Mathematical Medicine & Biology, dqz006:1-18. DOI: https://doi.org/10.1093/imammb/dqz006. P. W. Sweeney. 2018. Realistic numerical image-based modelling of biological tissue substrates. Doctoral thesis (PhD), University College London. R. J. Shipley & S. Chapman. 2010. Multiscale modelling of fluid and drug transport in vascular tumours. Bulletin of Mathematical Biology, 72(6):1464-91. Prerequisite Skills Mathematical physics, PDEs, Mathematical Analysis, Simulation Other Skills Used in the Project Fluids, Predictive Modelling, Asymptotic analysis, Green's Functions Programming Languages Python, MATLAB, C++ Work Environment Lab space will be provided. Hours are flexible. Can work remotely.

## Inferring ancient haplotypes and mutational signatures in Canine Transmissible Venereal Tumour

 Project Title Inferring ancient haplotypes and mutational signatures in Canine Transmissible Venereal Tumour Contact Name Kevin Gori Contact Email kcg25@cam.ac.uk Company/Lab/Department Department of Veterinary Medicine, University of Cambridge Address Dr Kevin Gori, Department of Veterinary Medicine, University of Cambridge, Madingley Road, CB3 0ES Period of the Project Flexible within July-September, e.g. 8 weeks 6th July - 28th August Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest February 21 Background Information The project is an investigation of the genetics of canine transmissible venereal tumour (CTVT). CTVT is a very rare class of organism: a contagious cancer that has escaped its original host and now infects dog populations worldwide. Transmission is through direct contact with the tumour, rather than being a consequence of viral infection. CTVT is also almost certainly the oldest cancer on Earth, having originated several hundred years ago. CTVT therefore uniquely gives us the opportunity to discover two things: how can a cell from a multicellular organism become an independent parasite, and what is the long term evolutionary trajectory of cancer. Brief Description of the Project The project will build on recent work done in our lab to unravel the earliest events that led to canine transmissible venereal tumour (CTVT) becoming an infectious cancer, and globally endemic. We have high coverage DNA sequencing information from several CTVT samples, as well as their hosts, which we have previously used to estimate the evolutionary tree underlying these tumours. This previous work has identified a new mutational signature (‘signature A’) that was active during the early evolution of CTVT, but later switched off. In this project we will aim to use tumour copy number estimation to identify ‘frozen time points’: fragments of DNA sequence that were gained during the early evolution of the cancer. The relative ages of these fragments will be estimated by the degree to which they have accumulated somatic mutations. Combined with this timing information, by examining the fragments for the presence of signature A we will be able to determine whether signature A occurred continuously, or in bursts. Additionally, the earliest fragments will be highly representative of the genotype of the animal in which CTVT arose, illuminating perhaps the characteristics of the earliest domesticated dogs. While we have a fixed goal in mind for the project, namely inferring the nature of the early mutations that occurred in CTVT, there is also plenty of scope for a student to pursue their own interests within the subject area, depending on how the project progresses. References 1: Strakova, Andrea, and Elizabeth P. Murchison. 2015. “The Cancer Which Survived: Insights from the Genome of an 11000 Year-Old Cancer.” Current Opinion in Genetics & Development 30: 49–55. 2: Leathlobhair, Máire Ní, Angela R. Perri, Evan K. Irving-Pease, Kelsey E. Witt, Anna Linderholm, James Haile, Ophelie Lebrasseur, et al. 2018. “The Evolutionary History of Dogs in the Americas.” Science 361 (6397): 81–85. 3: Baez-Ortega, Adrian, Kevin Gori, Andrea Strakova, Janice L. Allen, Karen M. Allum, Leontine Bansse-Issa, Thinlay N. Bhutia, et al. 2019. “Somatic Evolution and Global Expansion of an Ancient Transmissible Cancer Lineage.” Science 365 (6452): eaau9923. Prerequisite Skills Statistics, Probability/Markov Chains Other Skills Used in the Project Probability/Markov Chains, Numerical Analysis, Simulation, Data Visualization Programming Languages Python, R Work Environment The student will work in the Transmissible Cancer Group, based in the Department of Veterinary Medicine. As well as myself, there is the group principal investigator, Elizabeth Murchison, a PhD student, a lab manager and a post-doc. The project would benefit most from a student who can come to the office daily, between 10:00 - 16:00 (as a guideline). Remote working is acceptable when this is not possible.

## Symmetries in feature maps for physics based machine learning.

 Project Title Symmetries in feature maps for physics based machine learning. Contact Name Gabor Csanyi Contact Email gc121@cam.ac.uk Company/Lab/Department Engineering Laboratory Address Cambridge University Period of the Project 8-10 weeks, as agreed Project Open to Undergraduates, Master's (Part III) students Initial Deadline to register interest March 15 Background Information Brief Description of the Project There are two crucial components to most machine-learning models: the regression method (e.g. ANN, Gaussian process, etc) and the feature map. In physics-inspired machine learning in particular it has been proven time and again that the choice of features has a critical influence on the success of the model. For example, it is often required that a physical model preserves certain symmetries - not approximately, but exactly. This is precisely the topic of this summer project. We will first formulate minimal requirements on general feature maps: (i) smoothness; (ii) invertibility; (iii) stability. We will then study how tools from invariant theory (e.g., symmetric polynomials) can be employed to construct symmetric feature maps that satisfy all of the above requirements. Specific applications we have in mind come from molecular modelling where the symmetry groups arise from invariance of molecules under certain isometry and permutation subgroups. References Prerequisite Skills Simulation Other Skills Used in the Project Mathematical physics, Numerical Analysis Programming Languages Python, Julia Work Environment Small team, including mathematician Prof Christoph Ortner