skip to content

Summer Research Programmes

 

This is a list of CMP academic project proposals from summer 2016.

 

Symmetry and asymmetry in transport proteins

Contact Name Edmund Kunji
Contact email ek@mrc-mbu.cam.ac.uk
Lab/Department Medical Research Council
Address Wellcome Trust / MRC Building Cambridge Biomedical Campus
Period of the Project 2 months
Brief Description of Project Transport proteins transport metabolites, vitamins and ions across biological membranes. Many of these proteins have pseudo-symmetry, meaning that they consist of domains that are similar but not identical to each other. It is likely that they were originally perfectly symmetrical, but they became pseudo-symmetrical because the proteins acquired new functions, which required alterations of the functional parts of these proteins. Thus, it should be possibly to find the functional elements of these proteins by determining the deviations of symmetry. See for example; The mechanism of transport by mitochondrial carriers based on analysis of symmetry. Robinson AJ, Overy C, Kunji ER. Proc Natl Acad Sci U S A. 2008 Nov 18;105(46):17766-71. doi:10.1073/pnas.0809580105.
Upload files if required  
Skills Required Programming skills, but they can be learned on the job.
Skills Desired Experience in Perl and Bioperl.
Any Other Information  

[Return to List]

The Entropy Generation of Pollution

Contact Name Adam Boies
Contact CRSid amb233@cam.ac.uk
Lab/Department Boies
Address Department of Engineering ISO-45 Trumpington Street Cambridge, UK CB2 1PZ
Period of the Project Summer 2016
Brief Description of Project Pollution consists of particles in the atmosphere which evolves over time whereby Brownian motion causes particle to collide and coalesce leading to an evolution in particle size as the pollution ages. This project is particularly interested in quantifying the evolution of particles as described by the Smoluchowski coagulation equation in terms of the entropy that is generated during the process. As the process evolves information is lossed obfiscating the initial state of the pollution when measured at later times. The degree to which information is lost (and entropy generated) has never been studied, and is thus a fruitful area for research.
Upload files if required  
Skills Required The student project will pair computer models with theoretical equations to match the statistical thermodynamic and classical thermodynamic approaches to entropy generation. As such, a basic ability to manipulate Mathematica code, and manipulate integrodifferential equation equations is desired.
Skills Desired  
Any Other Information  

[Return to List]

Application of Compressed Sensing to Nuclear Magnetic Resonance (NMR) Spectroscopy

Contact Name Dr. Daniel Nietlispach
Contact CRSid dn206@cam.ac.uk
Lab/Department Biochemistry
Address Department of Biochemistry University of Cambridge 80 Tennis Court Road Old Addenbrooke's Site Cambridge CB2 1GA
Period of the Project 8 weeks (negotiable)
Brief Description of Project NMR spectroscopy is a widely used technique in structural biology enabling atomic resolution structures of proteins and other biomolecules and insights into their function, such as protein motion. Over recent years we have been actively developing new data processing methodologies enabling more efficient data processing. Generally these methods are termed 'non uniform sampling’ (NUS)' methods and typically these methods require non-Fourier Transform reconstruction techniques to convert irregularly sampled time domain data into the frequency domain. We have been working on the development and implementation of methodologies known as 'compressed sensing’ (CS)', based on l1-norm minimisation. CS arose in the literature of information theory [1], [2] and has been applied widely for example in MRI [3] and NMR [4], [5] as well as other areas such as image compression, astronomy,tomography etc. [6]. The area is revolutionising the NMR field allowing us to obtain information which was previously inaccessible, and increasing the range of challenging biomolecules which NMR can study. The main benefits of the combination of NUS recording and CS data reconstruction are increases in signal-to-noise and spectral resolution, both typically the limiting factors in NMR spectroscopy. Our current interests are the following: 1. Algorithm development. CS is an actively developing area within applied maths. New algorithms are regularly released with improvements in speed and reconstruction accuracy. We are interesting interested in implementing some of these algorithms for NMR data processing and assessing any improvements over the existing algorithms. This would require a literature search to identify new algorithms and then coding the algorithm to work for NMR data reconstruction. 2.Reducing the sampling requirements with prior information. Prior information is often available in NMR studies from existing experiments. Repeat measurements frequently look at spectral changes (difference experiments) to track biological processes. This prior information should allow a substantial reduction in sampling requirements. This would involve working with, for example, protein dynamics data and developing the existing algorithm to use available prior information. 3. Software development. We have a python programme used by a number of researchers within the Biochemistry department. We would like to develop the code/gui to make this widely available to the NMR community. This would require putting our code together as a robust software package that can easily be operated by users who have little background knowledge.References: [1] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information.,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006. [2] D. L. Donoho, “Compressed sensing.,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [3] M. Lustig, D. L. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging.,” Magn. Reson. Med., vol. 58, no. 6, pp. 1182–1195, Dec. 2007. [4] D. J. Holland, M. J. Bostock, L. F. Gladden, and D. Nietlispach, “Fast multidimensional NMR spectroscopy using compressed sensing,” Angew. Chemie Int. Ed., vol. 50, no. 29, pp. 6548–6551, Jun. 2011.[5] K. Kazimierczuk and V. Y. Orekhov, “Accelerated NMR spectroscopy by using compressed sensing,” Angew. Chemie Int. Ed., vol. 50, no. 24, pp. 5556–5559, Apr. 2011. [6] D. J. Holland and L. F. Gladden, “Less is More: How Compressed Sensing is Transforming Metrology in Chemistry,” Angew. Chemie Int. Ed., vol. 53, pp. 13330–13340, 2014.
Upload files if required  
Skills Required Good programming experience, particularly with Python. Interest in information theory.
Skills Desired Familiarity with compressed sensing theory. Experience creating GUIs e.g. with Tk or Qt.
Any Other Information  

[Return to List]

Mathematical Estimation Beyond The Grave

Contact Name John Robb
Contact CRSid jer39@cam.ac.uk
Lab/Department Archaeology
Address Division of Archaeology Department of Archaeology and Anthropology Downing Street Cambridge CB2 3DZ
Period of the Project May-August 2016
Brief Description of Project Archaeologists often excavate ancient tombs which contain a disorderly jumble of human bones -- this is common in situations ranging from Neolithic long barrows in England to Bronze Age underground chambers in the Near East, Native American mounds in North America, cult caves, Maya tombs and many other places. How many people were buried in such a tomb? It is very hard to estimate. The standard archaeological method is the calculate the Minimum Number of Individuals (MNI): you simply count up all the bones, and if the most common one is the left femur and there are 35 of them, there must have been at least 35 people buried in the tomb to yield this number of bones. But everybody in the field knows that the MNI does not tell you how many people were likely to have been buried in the tomb with any accuracy, and with large and highly fragmented collections it probably underestimates the tomb's population by an order of magnitude.This project seeks the help of a mathematician to try to develop an alternative measure, the Most Likely Number of Individuals: given a mathematical distribution of bone counts, can we quantify an estimate or range which is likely to actually represent how many bodies were deposited in the tomb? Preliminary thoughts suggest a number of directions we might try, including simulation, probability distributions, and set theory -- all integrated with contextual information about the site.
Upload files if required  
Skills Required Basic concepts of statistical probability and set theory; some interest in simulation modelling might be helpful.
Skills Desired Ability to communicate with non-mathematicians; enthusiasm.
Any Other Information  

[Return to List]

Missing mutations in cancer analyses

Contact Name Andy Lynch
Contact CRSid andy.lynch@cruk.cam.ac.uk
Lab/Department Tavaré Lab/CRUK CI
Address Cancer Research UK Cambridge Institute University of Cambridge Li Ka Shing Centre Robinson Way Cambridge CB2 0RE
Period of the Project 8-10 weeks (negotiable before starting the project)
Brief Description of Project One strand of cancer research involves comparing the genomic sequence of cancer cells to those in healthy tissue from the same individual in order the catalogue the somatic mutations that have occurred in the cancer. The principle is that among thousands of ‘passenger’ mutations there will be some that have driven the cancer, and if we repeat the process across many patients, these will appear more commonly than would be expected by chance. This catalogue of mutations is also the starting point for many other analyses including examining heterogeneity of cells within an individual tumour, inferring the history of the tumour, and saying something about the processes that have acted upon the tissue to cause these mutations. Generating the catalogue of mutations is, however, an error-prone process, and while several tools have been published that use probability models to best ensure that only genuine mutations are included in the catalogue, the question of estimating that which has been missed is one that has been neglected. While we cannot say which specific mutations have been missed, we may be able to characterize them more generally. It should be possible to estimate their numbers, the genomic contexts in which they are likely to be found, and so forth. The aim of this project is to devise methods to characterize the missed mutations for a patient and implement them in a tool that can be shared with the cancer research community. Time allowing there will be the opportunity to apply the tool to large cancer data sets and to investigate the stability of its performance in relation to a number of potential biasing factors. Such a tool would be of value to many cancer centres around the world and would help to avoid fallacious conclusions driven by variation in the power to detect mutations from patient to patient.
Skills Required Statistics, probability, some computer programming (preferably R and/or python), an interest in biology.
Skills Desired Some prior knowledge of genetics/genomics or cancer biology would be helpful, but is not essential.

[Return to List]

Formalised Mathematics

Contact Name Prof Lawrence Paulson
Contact CRSid lp15@cam.ac.uk
Lab/Department Computer Laboratory
Address University of Cambridge 15 JJ Thomson Avenue Cambridge CB3 0FD
Period of the Project summer 2016 (flexible)
Brief Description of Project Interactive proof assistants such as Isabelle and Coq have attracted prominence recently because of their use in formalising significant results in mathematics such as the Kepler conjecture and the odd order theorem. In the former case, the original mathematical proof was distrusted because of its reliance on a lengthy computation, as in the more familiar four colour theorem. The latter result was formalised to demonstrate the power of the technology. However, the value of proof assistants to mathematicians is hindered by the lack of libraries of fundamental theorems, results that most mathematicians would take for granted. The project is to formalise results of the candidate's choice in Isabelle, extending the existing libraries. The candidate will receive training in the use of an interactive proof assistant while continuing to the development of this technology. Topics suitable for formalisation include number theory (e.g. quadratic reciprocity), financial mathematics, complex analysis, approximation theory, etc.
Skills Required Knowledge of the requisite mathematics, familiarity with elementary logic and basic computer literacy
Skills Desired Prior familiarity with any proof assistant would be valuable.

[Return to List]

Statistical Identification of Mutation Hotspots in Protein Domains From Cancer Genomics Data

Contact Name Martin Miller
Contact CRSid martin.miller@cruk.cam.ac.uk
Lab/Department miller-lab.org, Cancer Research UK, Cambridge Institute, University of Cambridge
Address Martin Miller, Cancer Research UK, Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
Period of the Project 8 weeks over summer 2016
Brief Description of Project In cancer genomics, recurrence of mutations in independent tumour samples is a strong indicator of functional impact. However, rare functional mutations can escape detection by recurrence analysis owing to lack of statistical power. We have recently developed an approach in which we enhance statistical power by extending the notion of recurrence of mutations from single genes to gene families that share homologous protein domains. Domain mutation analysis also sharpens the functional interpretation of the impact of mutations, as domains more succinctly embody function than entire genes. Using cancer genomics datasets from thousands of tumour samples across dozens of tumour types, we have analysed somatic missense mutations in protein domains and discovered new domain mutation hotspots. By associating mutations in infrequently altered genes with mutations in frequently altered paralogous genes that are known to contribute to cancer, we have found new clues to the functional role of rare mutations in cancer and we have shared our findings through an interactive web-resource: www.mutationaligner.org (Miller et al 2015 Cell Systems, Gauthier et al 2016 Nucleic Acids Research) This project involves further development of the statistical framework to detect mutation hotspots in large-scale cancer genomics datasets approaching close to a million somatic mutations in total. Mutations are not random events but occur with different likelihood at genomic positions depending on nucleotide context, cancer aetiology, selective pressure, etc. The student will design and implement enhanced statistical techniques to detect significantly mutated domain hotspots accounting for: 1) the probability of observing a given somatic mutation given the sample-specific nucleotide mutation frequencies, 2) the likelihood of observing a specific amino acid change given the specific nucleotide triplet encoding the mutated amino acid. Successful implementation of this probabilistic framework will enhance the sensitivity of our approach to detect events that are selected for during cancer development and thereby help distinguish between causative driver mutations in cancer and irrelevant bystander mutations. M. L. Miller, E. Reznik, N. P. Gauthier, B. A. Aksoy, A. Korkut, J. Gao, G. Ciriello, N. Schultz & C. Sander. “Pan-Cancer Analysis of Mutation Hotspots in Protein Domains". Cell Systems 1(3), 197–209 (2015) N. P. Gauthier, E. Reznik, J. Gao, O. Sumer, N. Schultz, C. Sander& M. L. Miller. “MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer". Nucleic Acid Research, 44(D1):D986-91 (2016)
Skills Required Statistics, general maths, programming skills (python or matlab preferred)
Skills Desired Interest in cancer genomics or bioinformatics. Interest in permutation/resampling methods

[Return to List]

Representational Similarity and Deep Learning

Contact Name Dr Nikolaus Kriegeskorte
Contact CRSid Nikolaus.Kriegeskorte@mrc-cbu.cam.ac.uk
Lab/Department MRC Cognition and Brain Sciences Unit
Address 15 Chaucer Road Cambridge CB2 7EF
Period of the Project Summer (flexible)
Brief Description of Project Our lab uses a technique called Representational Similarity Analysis (RSA) to compare representations in visual areas of the brain to those in computational models. In RSA, we record brain activity (e.g. by fMRI) while showing a person images of objects. For each pair of images, we calculate the dissimilarity of the evoked brain activity patterns. We then compare the matrix capturing all pairwise dissimilarities to analogous matrices based on candidate computational models. Recently, we showed that representations in deep convolutional neural networks, but not other computational vision models, may explain the representation of objects in the brain (Khaligh-Razavi & Kriegeskorte, 2014). We are looking for a mathematician to work with us on one or both of the following projects: Project 1: Distance correlation and representational geometry Distance correlation (DC) is a measure of statistical dependence between two random variables introduced by Szekely et al. (2007). In RSA, we assess the similarity between model and brain data by calculating the correlation between distance matrices derived from each source. However, two representations are not necessarily independent when the correlation between their dissimilarity matrices is zero (e.g. they may have non-linear dependencies if using Pearson correlation). An appealing property of DC is that it is zero if and only if the variables are statistically independent. To evaluate how DC might be used in our research, we would like to answer the following questions: 1. how does DC relate to the correlation between distance matrices, e.g. how does double-centring the distance matrices give DC its special properties? 2. is DC problematic (e.g. positively biased) when the sample size is small, and can this be corrected? 3. can DC be calculated from distance matrices constructed using a non-Euclidean distance metric? Project 2: Deep neural networks and the renormalisation group The renormalisation group (RG) is a mathematical framework used in theoretical and particle physics that allows the investigation of systems viewed from different distance scales. Deep learning describes methods used in machine learning that have gained much attention from both academia and industry, as they were recently used to solve some highly abstract problems in computer vision, sometimes with super-human performance (LeCun, Bengio & Hinton, 2015). Furthermore, deep neural networks display internal representations that are surprisingly similar to those in biological brains, and are thus of vital interest to neuroscientists (Kriegeskorte, 2015). Recently, RG has been compared to deep convolutional neural networks, as non-linear transformations in deep networks have been interpreted as iterative coarse-graining of the input (Mehta & Schwab, 2014). RG offers a similar iterative coarse-graining technique that compresses a configuration of binary random variables (spins) to a smaller configuration with less variables (Holling, 2015). We would like to investigate how to employ the large body of research on RG to: 1. understand the theoretical underpinnings of deep learning 2. investigate how RG can help us improve deep neural network models of vision References: Holling (2015). "Renormalizing spin systems using deep learning techniques" http://www.thp.uni-koeln.de/~strack/Philipps_homepage/Theses_files/Bache... Khaligh-Razavi & Kriegeskorte (2014). “Deep supervised, but not unsupervised, models may explain IT cortical representation”. PLoS Computational Biology. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.10... Kriegeskorte (2015). "Deep neural networks: a new framework for modelling biological vision and brain information processing". Annual Review of Vision Science. http://www.biorxiv.org/content/biorxiv/early/2015/10/26/029876.full.pdf LeCun, Bengio & Hinton (2015). “Deep learning". Nature. http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html Mehta & Schwab (2014). “An exact mapping between the Variational Renormalization Group and Deep Learning” http://arxiv.org/pdf/1410.3831v1.pdf Szekely, Rizzo & Bakarov (2007) “Measuring and testing dependence by correlation of distances” Annals of Statistics. http://arxiv.org/pdf/0803.4101.pdf
Skills Required  
Skills Desired Familiarity with neural networks (for project 2) Computer programming, e.g. in Matlab, R or Python.

[Return to List]

Gene regulatory networks

Contact Name Philip Wigge
Contact CRSid philip.wigge@slcu.cam.ac.uk
Lab/Department Sainsbury Laboratory
Address SLCU 47 Bateman St CB2 1LR
Period of the Project 8 weeks or more
Brief Description of Project Plants integrate information about light and temperature in order to calculate when to grow and how much; however, not much is known about how plants integrate these signals at a molecular level. In this project, you will analyse large amounts of experimental data about a set of genes that control the circadian clock (the day/night cycle) AND are strong candidates for sensors of temperature information, to try to understand the mechanisms behind these gene's activities. This topic is especially relevant given that global warming is contributing to changes in global weather patterns, which has a huge effect on plants, both in a natural and agricultural context.
Skills Required familiarity with the command line and some scripting skills (either R or Matlab)
Skills Desired R programming preferred; background in statistics

[Return to List]

Project Choice Under Misaligned Preferences

Contact Name Aris Oraiopoulos
Contact CRSid no245@cam.ac.uk
Lab/Department Judge Business School
Address Trumpington St.
Period of the Project 8 weeks
Brief Description of Project Organizations face two key challenges when developing complex products. First, at the selection stage they need to select the most promising ideas. Second, at the execution stage, they need to ensure that the employees are motivated enough to develop them in the most effective way. Both the selection and the execution stage are subject to various biases which make this process inefficient (e.g., managers select projects that fit their own preferences but are not aligned with the organization's strategy). This study is motivated by the recent changes in the organizational and reward structure of a major pharmaceutical company. The objective of this project is to understand what is the optimal reward scheme (incentives) for the different divisions of the company.
Skills Required Optimization theory, probability theory and Mathematica or Matlab or Maple is required.
Skills Desired game theory

[Return to List]

Dynamics of cell growth vs growth hormone in tissues of living plants

Contact Name Alexander Jones
Contact CRSid alexander.jones@slcu.cam.ac.uk
Lab/Department Sainsbury Laboratory Cambridge University
Address SLCU Bateman St Cambridge CB2 1LR
Period of the Project July 2016 - September 2016
Brief Description of Project The plant hormone gibberellin (GA) is a powerful growth regulator that also controls key developmental transitions such as germination, flowering and fruiting. The Jones group is using novel FRET biosensors for GA and other hormones [1] to reveal their patterns and dynamics in living plants. The GPS1 sensor facilitates GA measurements in vivo and when combined with measurement of cellular growth rates will allow a student to systematically investigate the quantitative relationship between GA accumulations and cellular growth in multiple plant tissues. 1. Jones, A.M., et al., Abscisic acid dynamics in roots detected with genetically encoded FRET sensors. eLife, 2014. 3: p. e01741. The student would be trained to: 1. grow plants; 2. use our top of the range confocal microscopes to collect 3D time-lapse images of GA levels in growing cells; 3. use appropriate image processing software for analysis. This will take a few weeks, while the student will work closely with a postdoc to collect the first data sets. Next the student will participate in the development of 3D computational models of GA levels in comparison with cellular growth rates. These models will lead to quantitative predictions of how GA contributes to plant growth and development. Then the student may wish to study the effect of chemical perturbations that should alter the distribution of GA in the relevant tissues (e.g. addition of exogenous GA or GA biosynthetic inhibitors) to determine if the models predict the changes in growth resulting from the perturbations. Or, instead, there is the option to study the effect of genetic mutations in GA metabolism that would alter the distribution of GA in the relevant tissues.
Skills Required Basic knowledge of biology and programming in e.g. matlab/python/c++ would be an advantage, but there are no strict prerequisites for this project.
Skills Desired An interest in plant development and microscopy would be a great start!

[Return to List]

Analyzing models of stochastic gene expression.

Contact Name Henrik Jönsson
Contact CRSid henrik.jonsson@slcu.cam.ac.uk
Lab/Department Sainsbury Laboratory
Address Sainsbury Laboratory University of Cambridge Bateman street Cambridge CB2 1LR
Period of the Project within June-August
Brief Description of Project During the early stages of vascular development in leaves, broad domains of MP result in ATHB8 expression in a narrower domain. We have extensively explored the regulatory networks driving this using deterministic ODE models. However, data from our experimental collaborators show that there is a large amount of variation expression levels between cells although general trends do emerge. The aim of this project would be to study stochastic models of the gene regulatory network, building a framework for assessing the performance of the model against the data, implementing an optimization procedure to predict model parameters and exploring if simple network motifs can describe our data adequately.
Skills Required  
Skills Desired  

[Return to List]

Automatic quantification of shoot apical meristem phenotypes

Contact Name Henrik Jönsson
Contact CRSid henrik.jonsson@slcu.cam.ac.uk
Lab/Department Sainsbury Laboratory
Address Sainsbury Laboratory University of Cambridge Bateman street Cambridge CB2 1LR
Period of the Project within June-August
Brief Description of Project The aerial parts of plants are generated by a pool of stem cells located in a tissue called the shoot apical meristem (SAM). The arrangement of the organs, which is also referred to as phyllotaxis, is established from reiterative morphogenesis in the SAM. A correct understanding of organogenesis in the meristem requires a quantitative analysis of the morphology of the SAM. In recent years, tools have been developed to digitize 3D confocal images of meristems. This project aims to study the variability of shoot meristem traits in the plant model Arabidopsis. This work will involve developing algorithms and tools to quantify the morphology of the meristem in 3D.
Skills Required  
Skills Desired  

[Return to List]

Remodelling the economics of science: crowding out

Contact Name Terence Kealey
Contact Email terence.kealey@buckingham.ac.uk
Department/Lab Department of Economics, University of Buckingham
Address 21 Lyndewode Road Cambridge CB1 2HN
Period of the Project 1-3 months, depending on mutual convenience
Brief Description of Project In 1605 Francis Bacon proposed that science was a public good, but the accumulated empirical evidence that science does not in practice behave as a public good has led to calls – some from Cambridge - for a new economics of science.1 We recently proposed a new model, by which science is modelled as a contribution game in which players’ rewards are proportionate to their contributions to knowledge rather than to their copying of others’ research.2 We would like now to extend that into modelling crowding out. At the aggregate, national, level the evidence for government funding for R&D crowding out private funding is strong. One mechanism may be that the better the scientist, the more likely they will seek to work in a publicly-funded institution such as a university. The contribution model does not currently distinguish between the private and public sectors. All scientists use and contribute to the common pool of science S which we define as S= ä∫â aƒ(a)da = φ(ä) where ƒ(a) is the frequency distribution of ability and â and ä are respectively the highest and lowest ability levels of contributing scientists. We’d now like to develop a two-sector approach to (i) accommodate the differences between the public and private sectors, (ii) to distinguish scientific ability from the entrepreneurial talent that commercialises science, and (iii) to consider a joint frequency distribution g(a,b) whereby scientific ability ‘a’ and entrepreneurial talent ‘b’ are separately identified. The allocation of talent between two sectors could then be considered on the assumption that incentive contracts (money as opposed to reputation, say) differed, and the consequences then linked to characteristics of the joint frequency distribution g(a,b). We anticipate that this project would lead to a publication in a journal of the standing of Research Policy. (1) Dasgupta, P & David, P (1994) Toward a new economics of science. Research Policy 23, 487-521. (2) Kealey, T & Ricketts, M (2014) Modelling science as a contribution good. Research Policy 43, 1014-1024.
Skills Required We are looking for someone who is interested in applying maths to social and economic issues and who is familiar with game theory and calculus.
Skills Desired  

[Return to List]

Statistical Detection of Chromosome Conformation Capture Interactions

Contact Name Chris Wallace
Contact Email cew54@cam.ac.uk
Department/Lab MRC Biostatistics Unit & University of Cambridge Dept Medicine
Address Chris Wallace MRC Biostatistics Unit & University of Cambridge Dept Medicine, Cambridge Biomedical Campus. http://chr1swallace.github.io
Period of the Project Summer 2016
Brief Description of Project Chromatin conformation capture has been used to examine the folded structure of DNA, allowing us to understand which regulatory regions contact which gene promoters and better interpret the results of genomewide association studies. However, these require quantities of DNA that are infeasible if we want to study primary human cells. Capture Hi­C techniques have made the study of primary human cells possible by adding a step in which sequencing libraries are filtered for regions which, for example, overlap gene promoters [1,2]. However, the analysis of these data are not straightforward and different statistical methods have been proposed to distinguish random Brownian motion from genuine interactions [3,4,5]. This project will evaluate these methods in a selection of real datasets each of which has, importantly, validation data where the additional experiments have been conducted on the same material with complementary capture designs. This will allow methods to be evaluated in terms of concordance between the repeated experiments and understand how concordance varies as a function of factors such as distance around a bait, density of bait, read depth etc. A comprehensive understanding of the comparative strengths and weaknesses of the different approaches is a fundamental pre­requisite for improving them to realise the full potential of this exciting experimental technique, and it is expected that this comparative analysis can form the basis of a first author paper for a strong and motivated student. [1] Mapping long­range promoter contacts in human cells with high­resolution capture Hi­C. Borbala Mifsud, Filipe Tavares­Cadete, Alice N Young, Robert Sugar, Stefan Schoenfelder, Lauren Ferreira, Steven W Wingett, Simon Andrews, William Grey, Philip A Ewels, Bram Herman, Scott Happe, Andy Higgs, Emily LeProust, George A Follows, Peter Fraser, Nicholas M Luscombe & Cameron S Osborne Nature Genetics 47, 598–606 (2015) http://dx.doi.org/10.1038/ng.3286 [2] Capture Hi­C reveals novel candidate genes and complex long­range interactions with related autoimmune risk loci Paul Martin, Amanda McGovern, Gisela Orozco, Kate Duffus, Annie Yarwood, Stefan Schoenfelder, Nicholas J. Cooper, Anne Barton, Chris Wallace, Peter Fraser, Jane Worthington & Steve Eyre http://dx.doi.org/10.1038/ncomms10069 [3] CHiCAGO: Robust Detection of DNA Looping Interactions in Capture Hi­C data Jonathan Cairns, Paula Freire­Pritchett, Steven W. Wingett, Andrew Dimond, Vincent Plagnol, Daniel Zerbino, Stefan Schoenfelder, Biola­Maria Javierre, Cameron Osborne, Peter Fraser, Mikhail Spivakov bioRxiv doi: http://dx.doi.org/10.1101/028068 [4] Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi­C Nicola H. Dryden, Laura R. Broome, Frank Dudbridge, Nichola Johnson, Nick Orr, Stefan Schoenfelder, Takashi Nagano, Simon Andrews, Steven Wingett, Iwanka Kozarewa, Ioannis Assiotis, Kerry Fenwick, Sarah L. Maguire, James Campbell, Rachael Natrajan, Maryou Lambros, Eleni Perrakis, Alan Ashworth, Peter Fraser, and Olivia Fletcher Genome Res. November 2014 24: 1854­1868; Published in Advance August 13, 2014, http://dx/doi.org/10.1101/gr.175034.114 [5] https://github.com/nservant/HiC­Pro
Skills Required Knowledge of a scripting language such as bash, Ruby or Python is essential, together with knowledge of software to perform statistical and exploratory graphical analysis of data such as R.
Skills Desired  

[Return to List]

Bayesian variational methods for characterising flexible fibre endoscopes

Contact Name Sarah Bohndiek
Contact Email seb53@cam.ac.uk
Department/Lab Physics / Engineering
Address Sarah Bohndiek (supervisor): seb53@cam.ac.uk, Department of Physics
George Gordon (daily supervisor): gsdg2@cam.ac.uk, Department of Engineering (CAPE)
Period of the Project 8 weeks
Brief Description of Project Flexible fibre endoscopes are widely used for diagnosing and treating diseases of the gastrointestinal and respiratory tracts, particularly cancers. Recent work has seen the development of ultra-thin lensless endoscopes, of order 200um in diameter, that can be used as minimally-invasive and highly sensitive imaging tools. In order to overcome image distortion that occurs in these waveguides as they bend, the impulse response function of the fibres must be periodically re-characterised Together, these impulse responses form a transmission matrix that can be used to reconstruct images taken inside the body. However, this characterisation process can be a bottle-neck as it needs to be done fast enough for video-rate imaging. Recently, Bayesian variational methods have been used as a computationally efficient means of reconstructing transmission matrices of generalised scattering media. This project will seek to take this work and apply it to the simpler case of measuring transmission matrices of medical imaging fibres. The project would involve adapting software algorithms from relevant literature and then evaluating their performance based on existing experimental data. The project would advance computational and software skills, and provide an introduction to algorithms such as expectation maximisation, which are widely used in fields such as machine learning.
Skills Required Working knowledge of Matlab
Skills Desired  

[Return to List]