This is a list of CMP academic project proposals from summer 2017.
Value of Information in Early Phase Clinical Trials
Mathematical Solutions for Aeronautical Gas-turbine Noise Emissions
Signal Extraction Techniques for 21-cm Cosmology Radiometric Experiments
Compressed Sensing Development for Nuclear Magnetic Resonance (NMR) Spectroscopy
Modelling the light distribution in photoacoustic imaging
Spatiotemporal dynamics of plant growth hormone gibberellin (GA) and cellular growth in plant cells
Stem cell packing
Single cell normalisation
Application of Compressive Sensing to Hyperspectral Endoscopy
Tensor Networks on TensorFlow
Modelling signal processing for stem cell control
Modeling the evolutionary dynamics leading to Acute Myeloid Leukemia
Detection of animal faecal parasites using image analysis with an ioLight portable microscope
Machine Learning for Internet Traffic Classification
Modelling Tandem Queues in Data Centres
Counting motifs in bipartite graphs
Music and Mathematics: Formal Methods in Style Detection
Image analysis of ice cream microstructure
Application of mathematics in real world clinical management
Value of Information in Early Phase Clinical Trials
Contact Name: |
Simon Bond |
Contact email: |
|
Lab/Department: |
Cambridge Clinical Trials Unit |
Contact Address: |
Addenbrooke's Hospital, Coton House Level 6 - Box 401 Hills Road |
Period of the Project: |
summer 2017 |
Brief Description of Project: |
Exploring and developing novel tools to design clinical trials and justify the choice of sample size.
A current standard method for choosing a sample size in confirmatory clinical trials is based around hypothesis testing and providing enough statistical power. This does not generalise at all well to exploratory studies in early clinical research. An alternative method called “Value of Information”, is rooted in decision theory and health economic analysis, and attempts to quantify the financial value of gaining information. Methods have been developed to parallel the traditional hypothesis-test-statistical-power approach of late stage confirmatory studies, but little work exists on early stage studies; particularly when there is uncertainty about the standard deviation of an endpoint, on top of uncertainty around its mean value. |
Skills Required: |
Statistical inference, computer programming |
Skills Desired: |
familiarity with R programming language clinical trials decision theory economics |
Project Open to: |
Part III Students |
Deadline to register interest: |
Mathematical Solutions for Aeronautical Gas-turbine Noise Emissions
Contact Name: | Luca Magri |
Contact email: | lm547@cam.ac.uk |
Lab/Department: | Engineering Department |
Contact Address: | Engineering Department JDB, Fluid Dynamics group Trumpington Street Cambridge CB2 1PZ |
Period of the Project: | June - Sept 2017 |
Brief Description of Project: | Combustion noise is one of the dominant causes of noise pollution generated by the whole turbojet, which is bound to increase with the implementation of low-emission aeroengines. Entropy and vorticity inhomogeneities exiting the combustion chamber accelerated through the turbine are the two well-known indirect mechanisms that produce combustion noise. Recently, we discovered a third indirect noise mechanism caused by inhomogeneities in the gas composition. The system of hyperbolic equations was discretized numerically and solved. Numerical solutions are, however, time consuming and depend on the numerical scheme used.
The objective is to find a full analytical solution with mathematical techniques originated from Quantum Mechanics. The level of noise will be calculated for different Mach numbers, nozzle geometries and flow configurations. The analytical solution will overcome the approximate asymptotic methods used in the literature and industry. |
Skills Required: | Some familiarity with the following topics would be an advantage, but it not essential:
- Fluid dynamics and/or acoustics |
Skills Desired: | |
Project Open to: | Part III (master's) students, PhD Students |
Deadline to register interest: |
Signal Extraction Techniques for 21-cm Cosmology Radiometric Experiments
Contact Name: | Eloy de Lera Acedo |
Contact email: | eloy@mrao.cam.ac.uk |
Lab/Department: | Physics |
Contact Address: | Astrophysics Group Battcock centre for Astrophysics Cavendish Laboratory University of Cambridge JJ Thomson Avenue Cambridge CB3 0HE |
Period of the Project: | 8-12 weeks |
Brief Description of Project: | The Cosmic Dawn Epoch of the Epoch of Re-ionization are early cosmic epochs (until ~ 1 billion years after the Big Bang) when the Universe went from being a vast volume filled with chemical elements such as Hydrogen to become the realm of celestial objects (stars, galaxies, black holes, etc.) we can see today from Earth. The precise physical processes that took place at the time leading to the Universe we know today are unknown and their study is considered to be the last frontier in cosmological science. It is recognized that studying the red-shifted radio frequency emission from Hydrogen itself (the raw material that formed the first luminous objects) one could probe these epochs through cosmological time and shed light on some of the biggest remaining mysteries in the history of the cosmos. Super instruments like the SKA telescope will aim in a few years to do full tomography of these epochs. However, with a single antenna radiometer it is theoretically possible to detect the sky-averaged hyperfine transition line of atomic hydrogen (red-shifted from 21-cm to a few m due to the expansion of the Universe) in a matter of a few days. This however would require an extraordinary knowledge of the radio instrument and its spatial and spectral effects on both the cosmological signal and the much brighter foreground emissions. In this project the student will work on the development of the algorithms and signal extraction techniques required for the detection of the cosmic signal in this type of experiments. The student will work on a software framework and will back his findings with realistic simulations including the instrument model, the foregrounds and the cosmological signal to prepare for the observations and data analysis. |
Skills Required: | - Programming skills: Python/Matlab - Knowledge of signal extraction techniques (eg. Matched filter) - Knowledge of Bayesian theory |
Skills Desired: | - Background in experimental cosmology - Strong background in signal extraction techniques (eg. Matched filter) - Strong background in Bayesian theory |
Project Open to: | Undergraduates, Part III (master's) students, PhD Students |
Deadline to register interest: | 3 March |
Compressed Sensing Development for Nuclear Magnetic Resonance (NMR) Spectroscopy
Contact Name: | Mark Bostock |
Contact email: | mjb218@cam.ac.uk |
Lab/Department: | Biochemistry (Laboratory of Dr. Daniel Nietlispach) |
Contact Address: | Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Old Addenbrooke's Site, Cambridge CB2 1GA. |
Period of the Project: | 8 weeks (negotiable) |
Brief Description of Project: | NMR spectroscopy is a widely used technique in Biology and Chemistry enabling atomic resolution structural analysis of molecules such as proteins, providing insights into their function and dynamic behaviour.
Over recent years we have been actively developing new data processing methodologies that enable the reconstruction of NMR data that is incompletely sampled in the time domain. Generally these methods are termed 'non uniform sampling’ (NUS) and typically these methods require non-Fourier Transform reconstruction techniques to convert irregularly sampled time domain data into the frequency domain. We have been working on the development and implementation of methodologies known as 'compressed sensing’ (CS), based on l_p-norm minimisation, where typically p=1. CS arose in the literature of information theory [1], [2] and has been applied widely for example in MRI [3] and NMR [4], [5] as well as other areas such as image compression, astronomy, tomography etc. [6]. The area is revolutionising the NMR field allowing us to obtain information which was previously inaccessible, and increasing the range of challenging biomolecules and scenarios which NMR can study. The main benefits of the combination of NUS recording and CS data reconstruction are increases in signal-to-noise and spectral resolution, both typically limiting factors in NMR spectroscopy. Our current interests are the following: 2)Reducing the sampling requirements with prior information 3)Investigating the optimum sampling requirements 4)Develop GPU (graphical processing unit) approaches References: |
Skills Required: | Good programming expertise, particularly with Python. Interest in information theory. |
Skills Desired: | |
Project Open to: | Undergraduates, Part III (master's) students, PhD Students |
Deadline to register interest: | 3 March |
Modelling the light distribution in photoacoustic imaging
Contact Name: | Joanna Brunker |
Contact email: | jb2014@cam.ac.uk |
Lab/Department: | Department of Physics and CRUK Cambridge Institute |
Contact Address: | Cancer Research UK Cambridge Institute University of Cambridge Li Ka Shing Centre Robinson Way Cambridge CB2 0RE |
Period of the Project: | 31 July – 22 September |
Brief Description of Project: | Photoacoustic imaging is an emerging technique involving generation of ultrasound using pulses of laser light. Absorption of light by tissue chromophores such as haemoglobin in blood induces a small temperature rise leading to an increase in pressure, and consequently generation of ultrasound waves. Detection of the ultrasound at the tissue surface enables a map of tissue absorption to be reconstructed, since the amplitude of the ultrasound signal is proportional to the absorption. However, the photoacoustic imaging community faces a significant challenge in using the reconstructed images to accurately quantify the concentration of the absorbers, for example to calculate the concentration of oxyhaemoglobin in blood to find the blood oxygenation. The reason for this is that the absorption is proportional not only to the absorption coefficients of the tissue components, and their concentration, but also to the intensity of light incident on these tissue components (the light fluence). The light fluence distribution can be estimated experimentally, or by using models such as Monte Carlo or the diffusion equation, and then used to correct the photoacoustic images to more accurately quantify the absorber concentrations. We have already successfully implemented a correction using the diffusion equation in 2D, but still need to validate this against other models such as Monte Carlo, and to investigate corrections in 3D. The project will address these challenges using MATLAB modelling of both simulated and experimental data representing tissue absorption in a living mouse. |
Skills Required: | Proficiency with MATLAB |
Skills Desired: | Familiarity with image reconstruction and light models Familiarity with Monte Carlo and related computational algorithms |
Project Open to: | Undergraduates, Part III (master's) students, PhD Students |
Deadline to register interest: | 3 March |
Spatiotemporal dynamics of plant growth hormone gibberellin (GA) and cellular growth in plant cells
Contact Name: | Alexander Jones |
Contact email: | alexander.jones@slcu.cam.ac.uk |
Lab/Department: | Sainsbury Laboratory Cambridge University |
Contact Address: | Sainsbury Laboratory, Cambridge University Bateman St. Cambridge, CB2 1LR |
Period of the Project: | 8 weeks |
Brief Description of Project: | A major challenge in plant biology is to understand how multicellular organisms integrate dynamic developmental and environmental inputs to drive cellular responses. These cellular processes are well orchestrated across the spatial and temporal scales to enhance tissue- and organ-level functionality. Understanding the mechanisms that control the cellular processes, such as hormone levels and cellular growth, requires a multidisciplinary and systems-level approach.
The project is focused on using a combined approach of cell biology, molecular genetics, and mathematical modelling to visualize and quantify hormone levels that contribute to plant cellular growth. The plant hormone gibberellin (GA) is a powerful growth regulator that controls key developmental transitions such as germination, flowering and fruiting. The Jones group is using a novel FRET biosensor for GA (GPS1) to reveal GA patterns and dynamics in living plants. As a first step, GA will be visualized (using GPS1 biosensor) in growing cells using confocal microscopy, and a spatiotemporal map of GA levels and cellular growth rates will be developed. To test hypotheses about the dynamic relationship between GA levels and cell growth, GA levels will be manipulated using mutant plants defective in GA biosynthesis and treatments with exogenous GA. The quantitative data generated from these experiments will be used to feedback on ongoing 3D finite element model of growing cells, and the key predictions of the models will be tested through the experimental approaches. We anticipate an iterative combination of modelling and experimental work leading to a holistic understanding of how spatiotemporal patterns of GA growth hormone quantitatively and mechanistically relate to cellular growth. The student will gain experience in: 1. plant culture methods; 2. cutting-edge confocal microscopy techniques (FRET) to collect 3D time-lapse images of GA levels and cellular growth in growing cells; 3.advanced 4D image processing; 4. develop metrics describing the quantitative relationship between GA levels and cell growth; 5. participate in the ongoing development of a 3D finite element computational model aimed at integrating GA hormone levels, physical properties of cell wall, and cellular growth. The student will work closely with a postdoc (Ankit Walia) to collect and analyze data. |
Skills Required: | Basic knowledge of biology and programming in e.g. matlab/python/c++ would be an advantage, but there are no strict prerequisites for this project. |
Skills Desired: | An interest in plant development and microscopy would be a great start! |
Project Open to: | Undergraduates, Part III (master's) students |
Deadline to register interest: | 3 March |
Stem cell packing
Contact Name: | Lee Hazelwood |
Contact email: | lee.hazelwood@cruk.cam.ac.uk |
Lab/Department: | CRUK |
Contact Address: | Dr Lee Hazelwood Cancer Research UK Cambridge Institute University of Cambridge Li Ka Shing Centre Robinson Way Cambridge CB2 0RE Lee.Hazelwood@cruk.cam.ac.uk |
Period of the Project: | 8 weeks |
Brief Description of Project: | Intestinal crypts contain stem cells that maintain the intestine. These stem cells replicate and differentiate into paneth cells in the intestine forming an organised intercalated chessboard like pattern, see Figure 1a in ref [1]. The aim of this project is to develop a Cellular Potts model in order to understand how intercellular interactions lead to the observed stem and paneth cell organisation. We will do this by developing a Cellular potts code based on the original work by Craner and Glazier [2] for a simple geometry 2D geometry.
This project is self contained. [1] Sato et al. Nature 469, 415–418 (20 January 2011) doi:10.1038/nature09637 |
Skills Required: | Coding in Matlab, C or R. Previous experience in carrying out some simulations on Grids. |
Skills Desired: | Physical understanding of energy functions. |
Project Open to: | Part III (master's) students, PhD Students |
Deadline to register interest: | 3 March |
Single cell normalisation
Contact Name: | Lee Hazelwood |
Contact email: | lee.hazelwood@cruk.cam.ac.uk |
Lab/Department: | CRUK |
Contact Address: | Dr Lee Hazelwood Cancer Research UK Cambridge Institute University of Cambridge Li Ka Shing Centre Robinson Way Cambridge CB2 0RE Lee.Hazelwood@cruk.cam.ac.uk |
Period of the Project: | 8 weeks |
Brief Description of Project: | We are beginning to acquire genetic measurements (expression levels of all genes) from the single biological cells that comprise tissues. Early data indicates these genetic measures are heterogeneous across, potentially indicating different developmental programmes for these cells. However, one potential confounding factor to our confidence in these predictions is our ability to measure lowly expressed genes, which is often absent from the data. See for example [1].
This project will look at different methods to normalise this data and see how they are affected by missing data. Possible parts of the project depending on ones background and interest might include The project is self contained. [1] A step-by-step workflow for low level analysis of single cell RNA-seq data with bioconductor, Lun AT, McCarthy, DJ, Marioni JC, F1000Res. 2016 Aug 31;5 2122. |
Skills Required: | Familiarity with coding in Matlab, C and R. Simulation of data according to particular distributions. Experience using convolutions. |
Skills Desired: | Interest in bioinformatics. |
Project Open to: | Part III (master's) students, PhD Students |
Deadline to register interest: | 3 March |
Application of Compressive Sensing to Hyperspectral Endoscopy
Contact Name: | Dr. Sarah Bohndiek |
Contact email: | seb53@cam.ac.uk |
Lab/Department: | Physics |
Contact Address: |
Dr. Jonghee Yoon (daily supervisor) |
Period of the Project: | 8 - 10 weeks (negotiable before starting the project) |
Brief Description of Project: | Endoscopic surveillance is crucial for diagnosing cancer in gastrointestinal tract. Current endoscopic methods measure structural information and some biochemical features of tissue but this information does not provide sufficient contrast for early detection of the disease. In order to overcome this limitation, we are developing hyperspectral endoscopy. Hyperspectral imaging techniques measure the full spatial and spectral characteristics of tissue, which would enable early detection by combining contrast relating to structural, biochemical and molecular characteristics of the tissue. Both scanning (spatial & spectral) and snap-shot methods can be used to obtain hyperspectral images requiring either long operation times or limited spatial / spectral resolution, respectively. In this project, we will exploit compressive sensing (CS) techniques to develop real-time hyperspectral endoscopy while retaining high spatial and spectral resolution. CS is an imaging compression method that can achieve high speed by reducing the number of samples required to reconstruct an image. Recently, CS has been applied to various biomedical techniques including fluorescence microscopy, photoacoustic microscopy, X-ray imaging, and hyperspectral imaging. However, there are many challenges in applying CS to hyperspectral endoscopy. The aim of this project would be to develop new theoretical CS framework for hyperspectral endoscopy. The candidate would require a literature search to identify opportunities for CS and then perform simulations to develop and validate the new algorithm. Time allowing, there will be opportunity to apply the new algorithm to real hyperspectral endoscopic experiments ongoing in the laboratory. The developed CS method would be versatile for transferring hyperspectral endoscopy to clinical level.
References |
Skills Required: | Programming skills (Matlab), but they can be learned on the project |
Skills Desired: | Familiarity with compressed sensing theory and developing algorithms. Ability to communicate with non-mathematicians |
Project Open to: | Undergraduates |
Deadline to register interest: |
Tensor Networks on TensorFlow
Contact Name: | Austen Lamacraft |
Contact email: | al200@cam.ac.uk |
Lab/Department: | Physics |
Contact Address: | TCM, Department of Physics, Cavendish Laboratory |
Period of the Project: | Summer 2017 |
Brief Description of Project: | Understanding large quantum systems is central to many areas of science. The exponentially large Hilbert space of the problem is the main impediment to any general calculational method. A recent breakthrough in describing low dimensional systems (normally chains of interacting spins) uses tensor networks, in which the state of the system is represented using a network of matrices of much smaller dimension than that of the full Hilbert space.
The goal of this project is to implement a tensor network scheme using TensorFlow, Google's open source library for computations using data flow graphs. This is an efficient way to represent the tensor network algorithm, and allows performance to be maximized using GPUs. See https://arxiv.org/abs/1609.01552 for an introduction to the physics of the problem. |
Skills Required: | |
Skills Desired: | Familiarity with Python. |
Project Open to: | Part III (master's) students, PhD Students |
Deadline to register interest: | 3 March 2017 |
Modelling signal processing for stem cell control
Contact Name: | Henrik Jönsson |
Contact email: | henrik.jonsson@slcu.cam.ac.uk |
Lab/Department: | Sainsbury Laboratory |
Contact Address: | Sainsbury Laboratory, Bateman Street, Cambridge CB2 1LR |
Period of the Project: | 8 weeks |
Brief Description of Project: | "Continuous organ formation in plants is driven by a cluster of stem cells in the tip of their shoots. A regulatory network decides which cell should maintain its stem cell identity, and which cell should instead specialize and start forming a new organ. At the centre of this regulatory network lies the negative feedback between the protein CLAVATA3 (CLV3) (produced by the stem cells) and the stem cell promoting transcription factor WUSCHEL (WUS), produced in a separate region of cells.
The dynamics of negative feedback is well understood, but things get more complicated as the signalling from CLV3 to WUS goes through several receptors. The receptors all perform signal processing on their own and at least one of them exhibit adaptation, meaning that it effectively filters out the average CLV3 input and only responds to recent changes. How do the other receptors work, and how is the signalling from the three integrated to a single output? There is published data on what happens when you mutate these receptors, both one by one and in pairs. There is also data for how WUS is affected by changes in CLV3 concentration. With this, and with computational modelling, we aim to infer the relationship between these receptors, and what signal processing they must perform. This project have some clear goals, and a well defined place to start. But the project is easily extensible, and while we have ideas of where one could go next you would be in charge of your own research. The project is also adaptable from a more analytical to a mainly computational approach. |
Skills Required: | Solving and analysing differential equations. |
Skills Desired: | Experience with programming/scripting will be useful. |
Project Open to: | Undergraduates, Part III (master's) students, PhD Students |
Deadline to register interest: | 3 March 2017 |
Modeling the evolutionary dynamics leading to Acute Myeloid Leukemia
Contact Name: | Jamie Blundell |
Contact email: | jrb75@stanford.edu |
Lab/Department: | Cambridge Cancer Center & Deptartment of Oncology |
Contact Address: | Cambridge Cancer Center, Early Detection Program Hutchison MRC Research Center Cambridge Biomedical Campus Hills Road Cambridge CB2 0XZ |
Period of the Project: | June - Sept (flexible) |
Brief Description of Project: | Cancer is an evolutionary disease, but our quantitative understanding of its onset and progression remain basic. It is widely accepted that cancer develops by alterations in specific sets of genes. While a huge number of studies over the past decade have identified such alterations, much remains to be discovered about cancer dynamics, especially at a quantitative level. Better treatments will come from a deeper understanding of cancer’s evolutionary dynamics and will provide vital information for early detection and intervention. However, addressing these questions relies on the ability to quantitatively understand the clonal dynamics in both healthy and transformed tissues.
The need for a quantitative understanding of cancer evolution is keenly highlighted by blood malignancies such as Acute Myeloid Leukemia (AML). At diagnosis, most AML patients harbour mutations in a ~3 - 10 out of a possible ~100 or so AML-associated genes. The currently accepted theory is that these mutations are acquired sequentially: clones with more mutations are fitter than those with fewer and selectively outcompete them. However, given the low mutation rates and lack of genomic instabilities in these cancers, exactly how it is possible to accumulate this many mutations in slowly cycling stem cell populations has been difficult to explain. Further complicating the issue, recent deep sequencing studies suggest that even in “healthy” people, there is a large degree of clonal evolution: subclones harboring different sets of somatic mutations compete with one another over the course of your life. The goal of this research project will be to model how mutations arise and expand in tissues in order to understand what genomic signatures might be indicative of early cancer. Specifically we will: (i) Develop a stochastic “null” model of mutation dynamics in healthy (non-cancerous) tissues, maintained by a hierarchy of cell types. (ii) Modify the above model to include “driver” mutations that disrupt the homeostatic balance of the tissue, to model early cancer onset. Methods used will be a combination of branching processes, nonlinear dynamics and stochastic simulations. |
Skills Required: | Some exposure to stochastic processes, probability, asymptotic methods desired. Some programming also desired (Python, Matlab or similar). |
Skills Desired: | |
Project Open to: | Part III (master's) students, PhD Students |
Deadline to register interest: | 3rd March |
Detection of animal faecal parasites using image analysis with an ioLight portable microscope
Contact Name: | Carola-Bibiane Schönlieb |
Contact email: | cbs31@cam.ac.uk |
Lab/Department: | DAMTP |
Contact Address: | DAMTP, Wilberforce Road, CB3 0WA |
Period of the Project: | 8 weeks |
Brief Description of Project: | The over use of antibiotics is causing pathogens to become resistant to antibiotics, potentially leaving us without effective treatments for a wide range of conditions. One of the causes of antibiotic resistance is the regular and indiscriminate use of antibiotics in livestock - frequently farmers see a problem in one animal and immediately treat all of the heard or flock with antibiotics as a precaution, without establishing if antibiotics are necessary. The authorities are now starting to limit the use of antibiotics in livestock by imposing strict limits on the levels of antibiotics that the meat can contain. This forces farmers and vets to be much more selective with antibiotics and only use them when really needed. To do this, farmers need to be able to accurately diagnose disease on the farm, rather than sending samples off to a lab and waiting days for a diagnosis. Intestinal parasites are one of the more common issues for which antibiotics are used in cattle and sheep. There are a large number of parasites and only a small number are harmful, so to determine if antibiotics are required the faeces need to be analysed and the number and type of parasites present measured. This is done using a microscope to look for parasite eggs in the faeces, then counting the different types of egg found. Normally this is done with a bench microscope in a lab, and an experienced technician or parasitologist to manually identify and count the eggs. The new ioLight portable microscope has sufficient resolution to enable this to be done in the field, thus reducing the time required for diagnosis so the farmer immediately knows if treatment with antibiotics is appropriate or not. It is impractical to train farmers to manually count and identify eggs, so an automated image analysis solution is required to analyse the images from the microscope and count eggs. The image analysis software needs to work in the field, so it needs to operate on the relatively low powered Raspberry Pi within the microscope, but still deliver results quickly. This project is to develop image analysis software running on a Raspberry Pi to analyse microscope images of faecal samples, identify the eggs present, then measure dimensions of the eggs and other simple parameters that could then be used to categorise the eggs. Efficient image analysis algorithms are required to ensure that the results are delivered to the user without significant delay. This project will be jointly supervised by |
Skills Required: | Maths 1a and 1b. Versatile in MATLAB programming. A curious mind and a passion for problem solving. |
Skills Desired: | Numerical analysis (numerical linear algebra and numerics for differential equations), harmonic analysis, partial differential equations. Experience in programming on the Raspberry Pi. |
Project Open to: | Undergraduates, Part III (master's) students |
Deadline to register interest: | 3 March 2017 |
Machine Learning for Internet Traffic Classification
Contact Name: | Dr Andrew W Moore |
Contact email: | andrew.moore@cl.cam.ac.uk |
Lab/Department: | Computer Laboratory |
Contact Address: | Computer Laboratory, William Gates Building, University of Cambridge, 15 JJ Thomson Avenue |
Period of the Project: | 8 weeks (negotiable) |
Brief Description of Project: | In an original piece of research in 2005, by Denis Zuev (Part III) and Andrew W. Moore of the Computer Laboratory, we showed how Bayes methods of machine-learning were, when combined with high-quality ground-truth data, able to provide a useful Internet application-identification. In the intervening years there has been a revolution in Internet applications, in machine-learning methodologies, and in applications toward which such information may be put.
This proposal would be to consider evaluations of both the original data sets and new data-sets using a number of modern methodologies for training and testing Internet traffic. This would permit us to evaluate such methods in new environments, e.g., the data-center, and to consider the actual value of new methodologies when compared with long-standing mechanisms for constructing Bayesian priors. Practically this work will start by identifying suitable ML algorithm candidate(s) and make baseline comparison with the datasets of the 2005/6/7 work. This will enable a direct comparison of the value of these new algorithms, and in particular to explore the opportunity for continuous refinement of the prior. References: [2] Bayesian Neural Networks for Internet Traffic Classification
[3] Probabilistic Graphical Models for Semi-Supervised Traffic Classification |
Skills Required: | Statistics background |
Skills Desired: | Machine learning background |
Project Open to: | Part III (master's) students, PhD Students |
Deadline to register interest: | |
For Industrial Hosts: |
Modelling Tandem Queues in Data Centres
Contact Name: | Andrew W Moore |
Contact email: | andrew.moore@cl.cam.ac.uk |
Lab/Department: | Computer Laboratory |
Company: | University of Cambridge |
Contact Address: | Computer Laboratory, William Gates Building, University of Cambridge, 15 JJ Thomson Avenue |
Period of the Project: | 8 weeks (negotiable) |
Brief Description of Project: | The modelling of networks of queues has been an extraordinarily successful technique to understand traffic behaviour in (computer packet) networks. Queues within computer networks are often modeled as independent and isolatable. The practical reality is computer networks are sets of tandem queues, each with different arrival and processing distributions.
Modelling to date has relied on the independence of queues for (simplistic) modeling; yet the integration of the tandem nature is important to have the models encapsulate properties within modern data-center networks. Additionally, treating a collection of (tandem) queues as a reduced number of queues enables practical simulation and reproduction of queueing impact within large scale queue structures - such as DC networks - using current simulation tools. As data centres grow in scale and complexity, common simulation tools can no longer support classical queueing modeling used in legacy models. This proposal would be to consider the collapse of multiple queues models into a complex single queue, representing a single end-host view of the queueing of packets within the network and the delay introduced by queuing for each packet. Beyond the theoretical modelling, this work will aim to validate the conclusions using real packet traces measurements. This will enable a direct comparison of the theoretical modelling of the queueing in the network, using both legacy simulators and the new models, with the actual packet behavior. |
Skills Required: | A strong statistics background, with a good knowledge of queueing theory. |
Skills Desired: | |
Project Open to: | Part III (master's) students, PhD Students |
Deadline to register interest: | |
For Industrial Hosts: |
Counting motifs in bipartite graphs
Contact Name: | Benno Simmons |
Contact email: | benno.simmons@gmail.com |
Lab/Department: | Conservation Science Group, Department of Zoology |
Contact Address: | Department of Zoology, University of Cambridge, The David Attenborough Building, Pembroke Street, Cambridge, CB2 3QZ, UK |
Period of the Project: | 8 weeks between late June and September, but can be flexible |
Brief Description of Project: | In 2002, the concept of network motifs was introduced for unipartite graphs. Motifs are subgraphs defined by a particular pattern of interactions between vertices. Particular motifs may reflect particular functions in real-world networks, and have been taken up widely in fields such as biology, for the analysis of food webs and gene regulatory networks. However, the approach has rarely been applied to bipartite graphs, and no widely accessible methods or software packages yet exist for doing so.
Bipartite graphs are widely used across a range of disciplines. In economics, trade networks can depict edges between countries and the products they export. In the social sciences, affiliation networks, such as those depicting edges between users and the movies they have watched, are common. In ecology, bipartite graphs have been very widely adopted. For example, communities of plants and their pollinators can be represented as bipartite graphs, with plants and pollinators as two sets of vertices, and edges representing their interactions. Other types of ecological interaction can be represented the same way, such as those between plants and the birds that disperse their seeds, or between hosts and their parasites. Alongside my collaborators, I have been working to extend the concept of motifs to undirected bipartite graphs, and develop code (in R, MATLAB and Python) so that these methods can be widely adopted. We have already developed methods to count the frequency of all possible undirected bipartite motifs containing 2, 3, 4 and 5 vertices in any bipartite graph. The next stage is to develop methods for counting the number of times each vertex occurs in each topologically unique position within each motif. Once this is completed, we want to extend our methods to counting motifs containing 6 vertices. The student would likely be focussed on (i) developing methods to count the frequency of motifs containing 6 vertices in bipartite graphs and, (ii) developing methods to count the number of times each vertex occurs in each topologically unique position within each of these 6-vertex motifs. Ideally, these methods would then be written into code (preferably in R or MATLAB). The student would then be free to pursue questions of interest. As an ecologist, I’m particularly interested in ecological networks (especially those characterising mutualistic interactions between species, such as plant and pollinators). I’m especially interested in recurrent and statistically significant motifs, and what they can tell us about (i) the evolutionary processes involved in network assembly, and (ii) the robustness of graphs to simulated perturbations. Examining robustness can involve simple topological models, or more complex approaches, such as modelling population dynamics on networks using generalised Lotka-Volterra equations. Motifs can also help answer questions such as whether the ways in which species are embedded in networks is a fundamental property of the species involved, or a function of context. Such approaches could help advance our understanding of ecological communities, and eventually inform conservation to ensure their protection in the future. If the student is keen to focus on other types of bipartite graph, this also possible: the project could follow the interests of the student once the core methods have been developed. |
Skills Required: | Graph theory, coding (R or MATLAB) |
Skills Desired: | |
Project Open to: | Undergraduates, Part III (master's) students, PhD Students |
Deadline to register interest: | 3 March |
For Industrial Hosts: |
Music and Mathematics: Formal Methods in Style Detection
Contact Name: | Prof Pablo Padilla |
Contact email: | pp432@cam.ac.uk |
Lab/Department: | Fitzwilliam College |
Contact Address: | Prof Pablo Padilla (Visiting Fellow in Mathematics, Fitzwilliam College)
Francis Knights (Fellow and DoS in Music, Fitzwilliam College) |
Period of the Project: | 4-8 weeks, variable |
Brief Description of Project: | The project (http://formal-methods-in-musicology.webnode.com/) focuses on developing mathematical methods to classify and establish authorship of musical material, here specifically two high-quality groups of 17th-century French keyboard music by members of the famous Couperin dynasty. From a practical perspective the aim is to develop computational tools that can classify musical works depending on their stylistic similarity. Additionally, the results should be of interest in algorithmic composition and generative music as well as in a more theoretical approach to the evolution of style and musical analysis.
Currently, we are exploring three approaches and we are looking for interested students to become involved with work on each of these: 1. Probabilistic and statistical models 2. Graph theoretical methods 3. Hierarchical signal processing and wavelet |
Skills Required: | Some of the following, depending on the project:
Probability, Statistics, clustering |
Skills Desired: | Contributions from graduate students with interests in any of these particular areas are welcomed. They will engage with the theoretical as well as the computational aspects of the project; some musical knowledge would be a benefit. |
Project Open to: | Undergraduates, Part III (master's) students, PhD Students |
Deadline to register interest: | 10 April |
Image analysis of ice cream microstructure
Contact Name: | Dr Carola Schönlieb (DAMTP), Rob Tovey (DAMTP), Peter Schuetz (Unilever) |
Contact email: | C.B.Schoenlieb@damtp.cam.ac.uk |
Lab/Department: | DAMTP |
Contact Address: | Centre for Mathematical Sciences, Wilberforce Road, Cambridge, CB3 0WA, |
Period of the Project: | 8 weeks during the coming summer break. |
Brief Description of Project: | Ice cream is a complex multiphase material that has a microstructure that consists of ice crystals and air bubbles bound together by a “matrix phase”, which is an aqueous solution containing sugars, proteins and emulsified fat. Maintaining this structure is an essential part of delivering a high-quality ice cream to the consumer. The best way to analyse the ice cream microstructure is to image this frozen structure using cryo-SEM (scanning electron microscopy). An example of the obtained images is shown here. Currently these images are mainly compared qualitatively as the segmentation needed to extract quantitative information about size distributions on ice crystals and air bubbles is currently done manually and is therefore extremely time consuming if reliable statistical data is required. Previous attempts to automate the segmentation in our typical images using standard software and processes were unsuccessful due to the lack of differentiation between air and ice structures and problems in robust recognition of the outline of ice crystals. With this project, we want to explore new approaches to allow a robust segmentation that could then be used as a routine method to extract quantitative information from the SEM images of the ice cream microstructures. |
Skills Required: | Experience in MATLAB or Python programming |
Skills Desired: | Numerical analysis, functional analysis, discrete geometry, convex optimisation |
Project Open to: | Current Part IB, II and Part III students in Mathematics Tripos |
Deadline to register interest: | 15 June 2017 |
Application of mathematics in real world clinical management
Contact Name: | Dr Rameen Shakur |
Contact email: | rameen@camheartwear.com |
Lab/Department: | Cambridge Heartwear |
Contact Address: | Cambridge Heartwear 23, Cambridge Science Park Milton Road, Cambridge CB4 OFN Telephone: 01223 437144 |
Period of the Project: | 8 weeks |
Brief Description of Project: | We would like to work with an enthusiastic student who is able to organise and analyse a number of clinically rich real world data in the management of a few active chronic diseases. This will involve the deep analysis of a number of clinical parameters and assessing the potential for modelling prospective physiological changes. |
Skills Required: | Computer literate Able to organise multiple computer files of clinical data Self motivated Free thinker |
Skills Desired: | Punctual Interested in health care and patient benefit |
Project Open to: | Undergraduates, Part III (masters) students, PhD Students |
Deadline to register interest: | 30 June |