skip to content

Summer Research Programmes

 

This is a list of CMP academic project proposals from summer 2018.

Deconvolution of immune cells in tumour tissue using gene expression data

Contact Name Alejandro Jiménez-Sánchez
Contact Email alejandro.jimenezsanchez@cruk.cam.ac.uk
Company Name Cancer Research UK Cambridge Institute
Address Robinson Way Cambridge CB2 0RE
Period of the Project 8 weeks (but student can start earlier if desired)
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3rd March
Brief Description of the Project The tumour microenvironment is a mixture of heterogeneous cell types including cancer, immune, and stromal cells. Recent studies have shown that the interaction between tumour and non-tumour cells influences the fate of the tumour. For example, we have previously shown significant variability in tumour-immune microenvironments among different tumours within the same patient leading to differing outcomes (Jiménez-Sánchez, et al. Cell 2017). However, the immune cell type characterisation of tumours using bulk gene expression data is still challenging. Thus we propose the following well-defined project: Can we reliably estimate the relative proportion of different immune cells present in a tumour using gene expression data? Multiple groups have tried to address this question, however the current state-of-the-art methods present at least one of the following issues: a) inaccuracy, b) cell bias, c) not comprehensive enough. We have already derived consensus gene signatures that outperform previous methods when benchmarked using a small set of tumour tissue samples. However, we want to make a more thoroughly benchmark and refine the consensus signatures/method (optional). Therefore, the specific aims in sequential order for the project are: 1) Compile benchmark data used by the other methods, 2) benchmark our consensus method against each of the other methods using their own test data, 3) (optional) refine/improve consensus method and signatures, 4) (optional) search and compile independent test data fro additional benchmarking. If aims 1 and 2 are completed and there is time, then aims 3 and 4 could be explored, however completing aims 1 and 2 would be enough to submit for publication.
Skills Required a) Strong programming skills (ideally Python and R) b) Unix experience
Skills Desired a) Web development skills (optional) b) Statistical modelling (optional) c) Bayesian statistics (optional) d) Machine learning (optional)

 

Analysis of pseudo-symmetry of proteins and protein complexes

Contact Name Edmund Kunji
Contact Email ek@mrc-mbu.cam.ac.uk
Company Name Medical Research Council Mitochondrial Biology Unit
Address Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 0XY, United Kingdom
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project In nature, there are proteins or protein complexes that are pseudo-symmetrical, as they consist of domains or subunits that are similar but not identical to each other. It is likely that they started out as fully symmetrical, but then over the course of evolution changed to become asymmetrical in adaptation to new functions Thus, amino acid residue relevant to those new functions can be identified by analysing the pseudo-symmetry of these proteins and protein complexes. In this project, we will develop a new scoring method for pseudo-symmetry and apply it to protein families with two-fold, three-fold and inverted topologies to identify amino acid residues that are key to the function of these proteins using sequence information alone. The project is well-defined.
Skills Required Basic programming skills and statistical analysis
Skills Desired Perl/Bioperl and Python programming

 

Maximising nestedness in bipartite graphs

Contact Name Benno Simmons
Contact Email bis22@cam.ac.uk
Company Name Conservation Science Group, Department of Zoology
Address Department of Zoology, University of Cambridge, The David Attenborough Building, Pembroke Street, Cambridge, CB2 3QZ, UK
Period of the Project 8 weeks between late June and September, but can be flexible
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project Bipartite graphs are graphs with two sets of vertices, where edges are between sets but not within sets. Bipartite graphs are widely used in a range of disciplines outside mathematics, such as in ecology, where they are used to represent interactions between species. For example, the two sets of vertices could represent plant and pollinators respectively, with edges representing which pollinators visit which plants. Alternatively, the vertices could correspond to hosts and their parasites, with edges showing which hosts are parasitized by which parasites. One pattern widely found in empirical ecological bipartite graphs is nestedness. A bipartite graph is nested when vertices with few edges have a subset of the edges of vertices with more edges. For example, imagine an ecosystem with three bees (X, Y and Z) and three plants (D, E and F). Let’s say bee X visits 3 plants; bee Y visits 2 plants; and bee Z visits 1 plant. The system would be perfectly nested if bee X visits D, E and F; bee Y visits D and E; and bee Z visits D. It is perfectly nested because X has all plants, Y has a subset of X’s plants, and Z has a subset of Y’s plants. Nestedness arises due to a variety of ecological and evolutionary processes, and is a subject of much study in ecology. Various measures exist to calculate the level of nestedness in a bipartite graph. However, because nestedness is affected by the number of vertices (species) and the number of edges (interactions between species) in a bipartite graph, it is necessary to normalise nestedness values so that they can be fairly compared between graphs. A recent paper by Song et al (2017) proposed that values should be normalised by expressing the nestedness of a graph relative to the maximum possible nestedness that could be achieved in a graph with the same number of vertices and edges i.e. nestedness_normalised = nestedness/max(nestedness). However, the methods proposed for calculating the maximum nestedness of a bipartite graph are very basic and extremely unlikely to find the true maximum nestedness (they used a greedy algorithm). We were able to find higher levels of nestedness than the algorithm proposed by Song et al, using a more sophisticated algorithm (simulated annealing). However, this algorithm is a heuristic algorithm, so we do not know if the solutions are optimal, or how far from optimality they are. The problem the student would work on is therefore: for a bipartite graph with a given number of vertices in each set, and a given number of edges, what is the maximum nestedness that can be achieved, given the constraint that each vertex must have at least one edge (i.e. each species must interact with at least one other)? This constraint is necessary for the graph to make ecological sense. Possible directions might include exact algorithms (e.g. branch and bound), but the student would be free to develop whatever methods they thought were best. These methods would then be written into code (preferably R or MATLAB). If the project is successful, the plan is to write this project up as a paper and submit it to a peer-reviewed journal (with the student as an author). The student would be based in the Conservation Science Group in the Department of Zoology, but we can be very flexible if the student wants to work remotely. As a supervisor, I will be very available for any discussions or questions the student wants to have. I hosted my first student from this programme last year, and it went very well – we have now written a paper based on the project (of which the student is an author), and are planning to submit it in January 2018. As well as being mathematically interesting, this project would would advance the field of ecology (and other fields using bipartite networks), by improving our understanding of ecosystems, and could potentially inform biodiversity conservation. Additionally, if the student finishes developing the methods, there are a number of interesting questions the student would be free to pursue using the newly developed methods.
Skills Required Graph theory, optimisation, coding (R or MATLAB preferable)
Skills Desired  

 

How to simulate coupled nuclear-electronic motion

Contact Name Tim Hele
Contact Email tjhh2@cam.ac.uk
Company Name Cavendish Laboratory
Address Kapitza Building room 28, Cavendish Laboratory, JJ Thomson Avenue, Cambridge, CB3 0HE
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3rd March
Brief Description of the Project This project aims to develop and extend a recently-invented algorithm for symplectic propagation of a nonseparable Hamiltonian that describes coupled nuclear-electronic motion. In particular, the aim is to describe coupled nuclear-electronic motion on multiple electronic surfaces and through degeneracies (conical intersections). This project combines both theory (algebraic derivation) and computation (coding the algorithm). Ideally this project would result in an off-the-shelf code, and while it is well-defined, there is huge scope for an ambitious student to extend this to compute quantum time-correlation functions. Reference: The Journal of Chemical Physics 148, 102326 (2018), https://doi.org/10.1063/1.5005557
Skills Required Hamiltonian dynamics, matrix algebra, partial differential equations, computer programming
Skills Desired Quantum mechanics, time-correlation functions, python and/or fortran

 

Entropy of Aerosols

Contact Name Adam Boies
Contact Email amb233@cam.ac.uk
Company Name Department of Engineering
Address amb233@cam.ac.uk
Period of the Project 8 weeks
Project Open to Part III (master's) students
Deadline to Register Interest  
Brief Description of the Project Develop a theoretical model for entropy generation within aerosols undergoing coagulation. This theory has only recently been developed and will take a classical thermodynamic and statistical thermodynamic approach. The project will take the form of a linear optimization project.
Skills Required Linear Programming, Optimization
Skills Desired  

 

A stochastic model for understanding PIN polarity in isolated cells

Contact Name Pau Formosa-Jordan
Contact Email pf324@cam.ac.uk
Company Name Sainsbury Laboratory
Address 47 Bateman Street, Cambridge CB2 1LR
Period of the Project From the 1st June until the 30th September
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project Living cells often break symmetry and adopt a preferential direction, which is known as cell polarity. This process is fundamental in yeast, plant and animal cells in a wide variety of contexts [1]. In the case of plants, it has been shown that cell polarity of PIN proteins is key for a wide variety of patterns, ranging from the arrangement of leaves and flowers around the shoot [2], to vein formation in leaves [3]. Recently, at the Sainsbury Laboratory (SLCU) we have found that plant cell cultures expressing a tagged fluorescence PIN reporter can show different spatio-temporal stochastic fluctuations of PIN in the cell membrane, and that such cells can exhibit PIN polarity even being isolated, with no direct cell-to-cell interactions (unpublished). This project will consist of studying how dynamic stochastic fluctuations can make cells spontaneously polarise, in a model where PIN polarity can occur when cells are isolated (see deterministic models in [1, 4] or simplified versions of it, e.g. see models in [5]). Stochastic dynamics will be implemented through Chemical Langevin Equations [6] with the Organism and Tissue software, which have been developed at the Jönsson Lab. There will be also the possibility to analytically study the different models through linear stability analysis [7] and local perturbation analysis [5]. Ultimately, this project will help to develop a better understanding of the nature of the observed PIN spatio-temporal fluctuations in our cell cultures. This project will be developed at the SLCU, within the Jönsson, Locke and Meyerowitz groups. 1. Abley K, De Reuille PB, Strutt D, et al (2013) An intracellular partitioning-based framework for tissue cell polarity in plants and animals. Development 140:2061–2074. doi: doi:10.1242/dev.062984 2. Jönsson H, Heisler MG, Shapiro BE, et al (2006) An auxin-driven polarized transport model for phyllotaxis. Proc Natl Acad Sci U S A 103:1633–1638. 3. Rolland-Lagan A, Prusinkiewicz P (2005) Reviewing models of auxin canalization in the context of leaf vein pattern formation in Arabidopsis. Plant J 44:854–865. 4. Abley K, Sauret-gueto S, Marée AFM, Coen E (2016) Formation of Polarity Convergences underlying Shoot Outgrowths. 1–60. doi: 10.7554/eLife.18165 5. Edelstein-keshet L, Holmes WR, Zajac M, et al (2013) From simple to detailed models for cell polarization. 368: 6. Gillespie DT (2000) The chemical Langevin equation The chemical Langevin equation. 297:297–306. doi: 10.1063/1.481811 7. Fàbregas N, Formosa-Jordan P, Confraria A, et al (2015) Auxin Influx Carriers Control Vascular Patterning and Xylem Differentiation in Arabidopsis thaliana. PLoS Genet 11:e1005183. doi: 10.1371/journal.pgen.1005183
Skills Required Computer programming skills are desired, but not required.
Skills Desired Some knowledge of nonlinear dynamical systems is desired, but not required.

 

Predicting the effect of mutations on protein-protein interactions

Contact Name Nick Goldman
Contact Email goldman@ebi.ac.uk
Company Name EMBL-European Bioinformatics Institute (a University Partner Institute)
Address Wellcome Genome Campus, Hinxton, CB10 1SD
Period of the Project by mutual agreement
Project Open to molecular biology, structural biology, physical chemistry
Deadline to Register Interest Undergraduates, Part III (master's) students
Brief Description of the Project prefer longer internships!
Skills Required The physical interaction between proteins is central to almost all aspects of biology, being the fundamental to how inter- and intra-cellular signals are processed and integrated, the recognition of foreign bodies by the immune system, and the assembly of supramolecular structures which dictate the physical and dynamic properties of the cell. Consequently, the effect of mutations on these interactions has implications for the molecular etiology of disease, for constraining the permissible substitutions that can accrue through the course of evolution, and the design of protein based therapeutics. This project is to develop a method of predicting how changes in protein sequence will affect the binding affinity of protein-protein interactions, trained from a large new data set of over 7000 experiments in which the affinities of wild-type and mutant proteins have been measured. The project will involve integrating features calculated from structure-based biophysical method using machine learning techniques. The resultant models will then be validated using external test sets, cross-validation and bootstrap sampling, undertaken in such a way as to account for the underlying biases in the training data. Should time permit, the model may also be applied to pathological mutations in the OMIM database and the "mutations influencing interactions dataset" from the IntAct database. This project requires, as a minimum, competence in programming and using linux and some familiarity with the basics of machine learning. While not essential, knowledge of molecular biology, structural biology or physical chemistry would be advantageous.
Skills Desired scientific programming, linux, some machine learning

 

Phylogenomics of recombinant bacteria

Contact Name Nick Goldman
Contact Email goldman@ebi.ac.uk
Company Name EMBL-European Bioinformatics Institute (a University Partner Institute)
Address Wellcome Genome Campus, Hinxton, CB10 1SD
Period of the Project by mutual agreement (longer internships preferred!)
Project Open to Undergraduates, Part III (master's) students
Deadline to Register Interest 30 April 2018
Brief Description of the Project Recombination helps bacteria survive and adapt to a changing environment, for example allowing bacteria to quickly acquire antimicrobial resistance. Recombination also confounds and complicates the reconstruction of bacterial evolution using phylogenetics, because it causes different genes to have different evolutionary histories. Within this project students will be able to pursue any of the following aims: - Reconstructing the evolutionary and epidemiological history of highly recombining bacterial human pathogens (e.g. N. gonorrhoeae and H. pylori) using different phylogenetic methods (both bacterial-specific and non) to account for recombination, and using new large genomic datasets. - Simulating recombinant bacterial evolution to test the efficiency of different evolutionary models and phylogenetic methods. - Developing new models and algorithm for reconstructing the evolutionary history of highly-recombinant bacteria from genomic datasets, in particular based on hidden Markov models. Some of these tasks require interest in the evolutionary history and epidemiology of bacteria, and the ability to work with large datasets and several different software distributions; other tasks require ability to code (different languages might work, such as C++, python, or Java), problem solving, and some basic mathematical concepts (preferably from probability and statistics) and skills.
Skills Required some of: interest in evolution and epidemiology of bacteria/large dataset handling/coding (C++/Python/Java/...)/problem solving/maths/probability/stats
Skills Desired others of: interest in evolution and epidemiology of bacteria/large dataset handling/coding (C++/Python/Java/...)/problem solving/maths/probability/stats

 

Developing stochastic models to explain the prevalence and distribution of a transmissible cancer

Contact Name Olivier Restif
Contact Email or226@cam.ac.uk
Company Name Department of Veterinary Medicine
Address Madingley Road, CB3 0ES
Period of the Project 8 weeks, flexible
Project Open to Undergraduates, Part III (master's) students
Deadline to Register Interest Still Open
Brief Description of the Project Co-supervisors: Dr Liz Murchison (epm27@cam.ac.uk) and Máire Lawlor (ml28@sanger.ac.uk), https://www.tcg.vet.cam.ac.uk/ The canine transmissible venereal tumour (CTVT) is a transmissible cancer that affects dogs. This disease, which manifests as genital tumours, is transmitted between animals by the transfer of living cancer cells, usually during mating. CTVT originally arose as a cancer in a single individual dog that lived several thousand years ago. Rather than dying together with this original host, CTVT survived by transmitting its cells to other hosts as a foreign graft. Today, CTVT affects dogs around the world, and is the oldest and most prolific cancer known in nature. CTVT persists at low clinical prevalence (~1 – 5%) in dog populations in most countries. However, the transmission dynamics that underlie this observation are not understood. Limited data on the natural clinical course of disease suggest long-term persistence in affected hosts and occasional immune-mediated tumour regression. The goal of the project will be to develop and analyse mathematical models representing alternative hypotheses for the transmission dynamics of CTVT within dog populations. Using probability theory and computer simulations, the student will analyse the behaviour of stochastic models to determine conditions that allow the long-term persistence of CTVT at low prevalence. In particular, we will consider the effects of different sources of heterogeneity among dogs (i.e. dog genetic heterogeneity, tumour heterogeneity), and the impact of chemotherapy treatment. The student will have the opportunity to work with epidemiological data and be part of an exciting collaboration between biologists and mathematicians. Overall, this project promises to broaden our understanding of the interaction between CTVT and its host, both at the individual level and across populations. This has implications for improving our knowledge of the infectious disease dynamics of this common canine pathogen, but also, importantly, may provide frameworks for understanding how CTVT may interact with the host immune system. The mechanisms whereby transmissible cancers interact with their hosts are of broad interest, both in veterinary and human oncology. Suggested reading 1. Murchison EP, Wedge DC et al 2014 Transmissible dog cancer genome reveals the origin and history of an ancient cell lineage. Science. 343:437-40 2. Strakova A, Ní Leathlobhair M et al, 2016 Mitochondrial genetic diversity, selection and recombination in a canine transmissible cancer. eLife. 5, e14552. 3. Restif O, DTS Hayman, JRC Pulliam et al. 2012. Model-guided fieldwork: practical guidelines for multi-disciplinary research on wildlife ecological and epidemiological dynamics. Ecology Letters 15:1083-94.
Skills Required Stochastic modelling, scientific programming
Skills Desired Familiar with R

 

Understanding bicycle chains for Team GB

Contact Name Anthony Purnell
Contact Email tonypurnell@btinternet.com
Company Name Engineering
Address The Wood, 4c Millington Road, Cambridge, CB3 9HP
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest Happy to wait till May
Brief Description of the Project This will be a continuation of the CCA project we are doing with Peter Taylor
Skills Required Mechanics and applied mathematics competence and interest would be ideal.
Skills Desired MATLAB programming, interest in comparing with experimental results. Understanding of elasticity and geometry would be helpful.

 

Development of machine learning based approaches for identifying new drug targets

Contact Name Namshik Han
Contact Email n.han@milner.cam.ac.uk
Company Name Milner Therapeutics Institute
Address Milner Therapeutics Institute, Tennis Court Road, Cambridge, CB2 1QN
Period of the Project 8 weeks: July and August
Project Open to Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project The aim of the Project is to combine detailed molecular studies with machine learning approaches to understand how genetic and epigenetic factors give rise to unique gene expression patterns in response to human diseases. To address this, we attempt to utilise the vast wealth of biological data produced by modern large scale genomic projects. Modern experimental techniques allow direct genome-wide measurement of various molecular feature and these measurements are deposited in large scale quantitative databases. However, the complex interactions of multiple genetic and epigenetic factors are hard to determine experimentally. In the absence of such biological experiments, it becomes necessary to utilise systematic analysis methods, including machine learning. Current state-of-the-art takes existing knowledge and asks "How does this data relate to what we already know?" By applying machine learning approaches the Project will integrate and explore the data to discover new biological knowledge and generate testable hypotheses. The Project is open-ended.
Skills Required Working knowledge in Machine learning, programming skills in MatLab or R
Skills Desired Interested in Computational Biology and/or DNA biology

 

Developing novel methods for interrogating tree ring anatomy for use in modelling carbon sequestration

Contact Name Dr Andrew D. Friend
Contact Email adf10@cam.ac.uk
Company Name Geography
Address Downing Place, Cambridge CB2 3EN
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project Tree wood anatomy records environmental, particularly climate, variations at annual and sub-annual timescales. Current methods exploit observations of ring width and maximum density in order to reconstruct past temperatures, and are critical in putting current global warming in context. These methods use sophisticated mathematical approaches that have evolved to overcome problems of variability and weak signals in the raw data. Rather than extracting climate signals from observed tree growth patterns, we are concerned with a different set of objectives than traditional "dendro" research. Namely, the understanding of wood development and carbon uptake and sequestration by trees, in other words how you relate environmental signals to tree anatomy, rather than the other way round. As such we are developing process-based tree growth models. It seems that the methods used in traditional "dendroclimatological" research result in great loss of information contained in raw tree ring data, information that we would like to learn how to exploit. We hope that you can help 1) in assessing what underpins current dendroclimatological methodologies and, especially, 2) in developing novel tree ring anatomy data analysis methods that will exploit the raw data in new ways that are better suited to our needs. Our project is rather exploratory. We have collaborations in the US (Harvard) and Switzerland, where raw data are being collected that we wish to exploit. This work has the potential to revolutionise understanding of controls on tree growth and the global carbon cycle, and hence contribute to reducing uncertainties in future climate change. Desired output is an R or Python script that can be used to analyse observational raw data in a way that makes it more valuable for process-based model output - observation comparisons. The last collaboration with a student at the Institute of Mathematics was very successful and resulted in a peer-reviewed publication (https://doi.org/10.3389/fpls.2017.00182). We look forward to hearing from you!
Skills Required Data analysis, possibly using R or Python.
Skills Desired  

 

Nonlinear optimization method for the estimation of neural activation patterns in users of cochlear implants

Contact Name Dr Robert Carlyon
Contact Email Bob.Carlyon@mrc-cbu.cam.ac.uk
Company Name MRC Cognition and Brain Sciences Unit
Address 15 Chaucer Rd, Cambridge CB2 7EF, United Kingdom
Period of the Project 8 weeks between late June and 30 September
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project A cochlear implant (CI) is an auditory prosthesis that provides a sensation of hearing to more than half a million deaf or severely hearing-impaired individuals. A CI uses an electrode array placed in the inner ear to electrically stimulate frequency-specific regions of auditory nerve fibres, as occurs in the normal hearing ear when listening to sound. However, sometimes stimulation sites along the array lie in neural “dead regions”, or are stimulating neurons at an adjacent turn of the cochlea, thus leading to distortions to the ideal pattern of excitation. These distortions are often harmful to sound perception and may result in difficulties for the user to understand speech. The aim of the proposed research project is to develop and evaluate an objective method that can be applied to CI users so as to identify these “distortions” and to guide methods for re-programming the CI so as to minimize their negative effects. The project is currently on-going and builds on data measurements from neural action potentials in CI users that will be used during the development. The student will be given the specific task to propose and evaluate a new method for non-linear optimization within the current software framework. The student will build on and compare to a previously published method by our lab (Cosentino et al., 2016) and will be supervised by the group leader and two post-docs. This is an exciting opportunity for an interested student to apply their theoretical knowledge to biomedical research and to experience being part of a research group at the MRC CBU. While the proposed task is quite defined for this project, creative and innovative approaches to improve the estimation performance and robustness of the algorithm are welcomed. Cosentino, S., Gaudrain, E., Deeks, J. M., & Carlyon, R. P. (2016). Multistage nonlinear optimization to recover neural activation patterns from evoked compound action potentials of cochlear implant users. IEEE Transactions on Biomedical Engineering, 63(4), 833-840.
Skills Required Knowledge or interest in mathematical optimization techniques Basic MATLAB and/or Python programming skills
Skills Desired Machine learning techniques, Matrix factorisation, Gaussian processes, Bayesian inference

 

Hunting for the ancestor of the mitochondrial carrier family

Contact Name Edmund Kunji
Contact Email ek@mrc-mbu.cam.ac.uk
Company Name Medical Research Council Mitochondrial Biology Unit
Address Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 0XY, United Kingdom
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project Mitochondrial carriers are transport proteins of the inner membrane of mitochondria, the powerhouses of the human cell. They transport a variety of different chemical compounds, such as keto acids, amino acids, fatty acids, inorganic ions, nucleotides and vitamins, across the lipid membrane that surrounds these organelles. The protein structures of the carriers consist of three domains that are not identical to each other in amino acid composition. The three domains move in unison to achieve the translocation of these compounds across the membrane, but the details of this mechanism have not been worked out. The carriers are eukaryotic in origin, as to date no bacterial or archaeal ancestor of these proteins has been found. By using sequence information of this large protein family, we would like to reconstruct the most likely ancestor of the carriers and to determine its basic function.
Skills Required Basic programming skills and statistical analysis
Skills Desired Perl/Bioperl and Python programming

 

Application of compressed sensing to biomolecular NMR spectroscopy

Contact Name Mark Bostock
Contact Email mjb218@cam.ac.uk
Company Name Biochemistry Department, Nietlispach laboratory
Address Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA
Period of the Project 8 weeks (flexibility regarding length and timing)
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project Reconstruction of incompletely sampled NMR data is an area of considerable interest within Biomolecular NMR spectroscopy enabling improvements in resolution with significant time savings, and is attracting considerable interest from mathematicians with the launch this year of an international competition for reconstruction of undersampled NMR data supported by Stanford Professor of Statistics, David Donoho. Compressed sensing (CS) reconstruction is a particularly popular method within NMR spectroscopy. We have a software package and accompanying GUI for CS reconstruction in use and are looking to extend this further. Principally we are interested in (i) new and improved algorithms for biomolecular NMR (ii) incorparation of prior information in CS reconstructions (iii) understanding the optimal sampling requirements for different experiments. The project could incorporate one or more of the above proposals. (iii) would require the further development of code to generate suitable sampling patterns for NMR data along with an accompanying GUI for ease of use by practising NMR spectroscopists.
Skills Required Familiarity with Linux and good programming expertise, preferably in Python, although we will consider students with good general coding experience as well. Interest in information theory.
Skills Desired Familiarity with compressed sensing theory.

Counting edge positions in bipartite graphs

Contact Name Benno Simmons
Contact Email bis22@cam.ac.uk
Company Name Department of Zoology
Address Department of Zoology, University of Cambridge, The David Attenborough Building, Pembroke Street, Cambridge, CB2 3QZ, UK
Period of the Project 8 weeks between late June and September, but can be flexible
Project Open to Undergraduates, Part III (master's) students, PhD Students
Deadline to Register Interest 3 March
Brief Description of the Project In my research, I use undirected bipartite graphs to describe interactions between species. For example, communities of plants and their pollinators can be represented as bipartite graphs, with plants and pollinators as two sets of vertices, and edges representing their interactions. Other types of ecological interaction can be represented the same way, such as those between plants and the birds that disperse their seeds, or between hosts and their parasites. Bipartite graphs can be decomposed into their constituent subgraphs called ‘motifs’. Motifs are subgraphs defined by a particular pattern of edges between a given number of vertices, with all vertices having at least one edge. Particular motifs may reflect particular functions in real-world networks, so by using them to analyse empirical ecological graphs we can gain insight into how ecosystems are structured and how they function. All bipartite motifs up to six vertices can be viewed in Figure A27 (page 34) of the following link: http://www.ecography.org/sites/ecography.org/files/appendix/ecog-00913.pdf Last year, a CMP (then called PMP) student helped develop methods to count: (i) the number of times each motif up to six vertices (the motifs listed in Figure A27 at the link above) occurred in a bipartite graph and (ii) the number of times each vertex occurred in each topologically unique position within each motif (unique positions are given by the numbers next to vertices in Figure A27 at the link above). The methods were written into code and are being released as an R software package with an accompanying peer-reviewed paper with the student as an author. This project builds on this previous work. The student would develop methods to count the number of times each edge in a bipartite graph occurs in each topologically unique position within each of the motifs in Figure A27 at the link above. For example, the first, second, third and fourth motifs in Figure A27 each contain 1 edge position. While the fifth motif contains three edge positions, and the sixth motif contains 2 edge positions. The student will then code these methods in R. I will then incorporate these methods into the existing R software package. These new methods will be written up as a new paper and submitted to a peer-reviewed journal with the student as an author. The student would be based in the Conservation Science Group in the Department of Zoology, but we can be very flexible if the student wants to work remotely. As a supervisor, I will be very available for any discussions or questions the student wants to have. As well as being mathematically interesting, this project would would advance the field of ecology (and other fields using bipartite networks), by improving our understanding of ecosystems, and could potentially inform biodiversity conservation. These methods are exceptionally cutting-edge in network ecology. Additionally, if the student finishes developing the methods, there are a number of interesting questions the student would be free to pursue using the newly developed methods. The student may not initially be familiar with R, but this should not be a barrier. The student last year was able to learn the basics very quickly, as R has a simple syntax (similar to MATLAB), and the code for this kind of project is likely to involve mathematical operations like matrix multiplication which are very straightforward in R. I will be able to incorporate the new methods into the software, so the student will not have to worry about this aspect of the R coding.
Skills Required Graph theory, coding
Skills Desired  

 

Automatically segment vestibular schwannomas

Contact Name Dr Roushanak Rahmat and Dr Stephen Price
Contact Email rr556@cam.ac.uk
Company Name Clinical Neurosciences
Address Department of Clinical Neurosciences, Division of Neurosurgery, Cambridge Biomedical Campus, Cambridge CB2 0QQ.
Period of the Project 8 weeks
Project Open to Undergraduates, Part III (master's) students
Deadline to Register Interest  
Brief Description of the Project To automatically segment vestibular schwannomas to measure changes of volume over time. These have the advantage that they are uniformly enhancing masses, at least 100 patients on long term follow up with multiple images. We have similar issues with incidentally found meningiomas and monitoring low grade gliomas.
Skills Required Medical image analysis, image processing
Skills Desired