skip to content

Summer Research Programmes

 

This is a list of Research in CMS projects from summer 2019.


Goodness-of-Fit Testing by Entropy Estimation

Contact Name Prof. Richard Samworth and Dr. Tom Berrett
Contact Email rjs57@cam.ac.uk, tbb26@cam.ac.uk
Department Statslab
Brief Description of the Project Available here.

Piecewise Smooth Regression

Contact Name Dr. Florian Pein
Contact Email Project no longer available
Department Statslab
Period of the Project Any time during summer
Brief Description of the Project

Finding abrupt changes in the mean of a time series, known as change-point / piecewise constant regression, is currently one of the most active research fields in statistics. However, in many applications, for instance in genetics, the assumption of a piecewise constant mean is violated and the mean can be better described by a smooth function up to few abrupt changes. In other words, one should aim to estimate a piecewise smooth function. Existing work has aimed to modify estimators for smooth regression functions, and is largely limited to the case where there is a single abrupt change.

In this project we will approach the problem from a change-point perspective. To this end, we will first study an approach that focuses on a single change and then extend it to handle multiple changes by combining it with a commonly used piecewise constant regression approach called wild binary segmentation. A possible outcome of the project would be develop a methodology for piecewise smooth regression, implement it in an R package, and demonstrate its performance through computer simulations. Finally, we will apply the methodology to real data, such as whole genome sequencing data to detect copy number variations. There is also a possibility of submitting a writeup of the work to a statistical journal for publication.

Skills Required Good programming skills, in particular in R; Fundamental statistical knowledge of tests and estimation procedures, e.g. Part IB Statistics.
Skills Desired Statistical background in estimation and test theory, e.g. Part II Principles of Statistics; additionally any knowledge in non-parametric statistics or time series analysis or experience with computer simulations would be helpful.

Applications of neural networks to nonparametric statistics

Contact Name Prof. Richard Samworth and Mr. Oliver Feng
Contact Email rjs57@cam.ac.uk, oyf20@cam.ac.uk
Department Statslab
Brief Description of the Project

Artificial neural networks have become a mainstay of machine learning and artificial intelligence over the past few decades, in part due to their outstanding predictive performance in application areas as diverse as computer vision, natural language processing and information filtering. For accessible introductions to network architectures and training algorithms, see [1]-[3]. Although the basic tasks carried out by neural networks (such as regression and classification) are statistical in nature, it is only recently that statisticians have sought to explain the somewhat magical properties of deep networks (with many hidden layers) from a rigorous mathematical standpoint. For example, [4] carries out a theoretical analysis of estimators based on deep neural networks with ReLU (rectified linear unit) activation function in multivariate nonparametric regression; see also [5] and [6]. Moreover, the flexible and versatile framework of GANs (generative adversarial networks) has inspired new approaches to robust estimation of a normal mean [7] and density estimation [8], both of which come with statistical guarantees.

The initial aim of this project is to conduct a thorough literature review and perform a simulation study to compare the performance of networks with different architectures in some familiar statistical contexts. There are then several follow-up directions the student may wish to pursue. One possibility would be to develop theory and/or methodology based on the aforementioned deep learning techniques in a specific application area within statistics (see [5] for example). Alternatively, it may be of interest to explore the algorithmic and computational issues associated with the training and regularisation of neural networks with different optimisation landscapes.

Programming experience is desirable but certainly not essential.

References: [1] Wang, T. (2018). Lecture 8: Artificial neural networks. Available at http://www.statslab.cam.ac.uk/~tw389/teaching/SLP18/Lecture08.pdf [www.statslab.cam.ac.uk].

[2] Hastie, T., Tibshirani, R. and Friedman, J. (2009). Elements of Statistical Learning, 2nd edition. Springer--Verlag.

[3] Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning. MIT Press.

[4] Schmidt-Hieber, J. (2017). Nonparametric regression using deep neural networks with ReLU activation function. Available at https://arxiv.org/pdf/1708.06633.pdf [arxiv.org].

[5] Farrell, H., Liang, T. and Misra, S. (2018). Deep Neural Networks for Estimation and Inference: Application to Causal Effects and Other Semiparametric Estimands. Available at https://arxiv.org/pdf/1809.09953.pdf [arxiv.org].

[6] Belkin, M., Rakhlin, A. and Tsybakov, A. B. (2018). Does data interpolation contradict statistical optimality? Available at https://arxiv.org/pdf/1806.09471.pdf [arxiv.org].

[7] Gao, C., Liu, J. and Zhu, W. (2018). Robust Estimation and Generative Adversarial Nets. Available at https://arxiv.org/abs/1810.02030 [arxiv.org].

[8] Liang, T. (2018). On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results. Available at https://arxiv.org/pdf/1811.03179.pdf [arxiv.org].