This is a list of potential summer research projects in the CMS. Projects are aimed at undergraduate students in the Faculty of Mathematics. Interested students should contact project supervisors directly to discuss the project. Some support for students doing summer research projects is available through our bursary scheme; the deadline for application is March 15. To remove a listing, please contact us here.

~~Goodness-of-Fit Testing by Entropy Estimation~~~~Piecewise Smooth Regression~~~~Applications of neural networks to nonparametric statistics~~

## Goodness-of-Fit Testing by Entropy Estimation

Contact Name | Prof. Richard Samworth and Dr. Tom Berrett |

Contact Email | rjs57@cam.ac.uk, tbb26@cam.ac.uk |

Department | Statslab |

Brief Description of the Project | Available here. |

## Piecewise Smooth Regression

Contact Name | Dr. Florian Pein |

Contact Email | Project no longer available |

Department | Statslab |

Period of the Project | Any time during summer |

Brief Description of the Project |
Finding abrupt changes in the mean of a time series, known as change-point / piecewise constant regression, is currently one of the most active research fields in statistics. However, in many applications, for instance in genetics, the assumption of a piecewise constant mean is violated and the mean can be better described by a smooth function up to few abrupt changes. In other words, one should aim to estimate a piecewise smooth function. Existing work has aimed to modify estimators for smooth regression functions, and is largely limited to the case where there is a single abrupt change. In this project we will approach the problem from a change-point perspective. To this end, we will first study an approach that focuses on a single change and then extend it to handle multiple changes by combining it with a commonly used piecewise constant regression approach called wild binary segmentation. A possible outcome of the project would be develop a methodology for piecewise smooth regression, implement it in an R package, and demonstrate its performance through computer simulations. Finally, we will apply the methodology to real data, such as whole genome sequencing data to detect copy number variations. There is also a possibility of submitting a writeup of the work to a statistical journal for publication. |

Skills Required | Good programming skills, in particular in R; Fundamental statistical knowledge of tests and estimation procedures, e.g. Part IB Statistics. |

Skills Desired | Statistical background in estimation and test theory, e.g. Part II Principles of Statistics; additionally any knowledge in non-parametric statistics or time series analysis or experience with computer simulations would be helpful. |

## Applications of neural networks to nonparametric statistics

Contact Name | Prof. Richard Samworth and Mr. Oliver Feng |

Contact Email | rjs57@cam.ac.uk, oyf20@cam.ac.uk |

Department | Statslab |

Brief Description of the Project |
Artificial neural networks have become a mainstay of machine learning and artificial intelligence over the past few decades, in part due to their outstanding predictive performance in application areas as diverse as computer vision, natural language processing and information filtering. For accessible introductions to network architectures and training algorithms, see [1]-[3]. Although the basic tasks carried out by neural networks (such as regression and classification) are statistical in nature, it is only recently that statisticians have sought to explain the somewhat magical properties of deep networks (with many hidden layers) from a rigorous mathematical standpoint. For example, [4] carries out a theoretical analysis of estimators based on deep neural networks with ReLU (rectified linear unit) activation function in multivariate nonparametric regression; see also [5] and [6]. Moreover, the flexible and versatile framework of GANs (generative adversarial networks) has inspired new approaches to robust estimation of a normal mean [7] and density estimation [8], both of which come with statistical guarantees. The initial aim of this project is to conduct a thorough literature review and perform a simulation study to compare the performance of networks with different architectures in some familiar statistical contexts. There are then several follow-up directions the student may wish to pursue. One possibility would be to develop theory and/or methodology based on the aforementioned deep learning techniques in a specific application area within statistics (see [5] for example). Alternatively, it may be of interest to explore the algorithmic and computational issues associated with the training and regularisation of neural networks with different optimisation landscapes. Programming experience is desirable but certainly not essential. References: [1] Wang, T. (2018). Lecture 8: Artificial neural networks. Available at http://www.statslab.cam.ac.uk/~tw389/teaching/SLP18/Lecture08.pdf [www.statslab.cam.ac.uk]. [2] Hastie, T., Tibshirani, R. and Friedman, J. (2009). Elements of Statistical Learning, 2nd edition. Springer--Verlag. [3] Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning. MIT Press. [4] Schmidt-Hieber, J. (2017). Nonparametric regression using deep neural networks with ReLU activation function. Available at https://arxiv.org/pdf/1708.06633.pdf [arxiv.org]. [5] Farrell, H., Liang, T. and Misra, S. (2018). Deep Neural Networks for Estimation and Inference: Application to Causal Effects and Other Semiparametric Estimands. Available at https://arxiv.org/pdf/1809.09953.pdf [arxiv.org]. [6] Belkin, M., Rakhlin, A. and Tsybakov, A. B. (2018). Does data interpolation contradict statistical optimality? Available at https://arxiv.org/pdf/1806.09471.pdf [arxiv.org]. [7] Gao, C., Liu, J. and Zhu, W. (2018). Robust Estimation and Generative Adversarial Nets. Available at https://arxiv.org/abs/1810.02030 [arxiv.org]. [8] Liang, T. (2018). On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results. Available at https://arxiv.org/pdf/1811.03179.pdf [arxiv.org]. |