## The Faculty of Mathematics has just launched a new institute researching the mathematics of information. Led by Carola-Bibiane Schönlieb, the Cantab Capital Institute for the Mathematics of Information (CCIMI) will explore fundamental mathematical theory and methodology for understanding, analysing, processing and simulating data.

## Taming big data

The world around us is changing – the internet and mobile telephony, medical imaging and satellite navigation, social networks, the entire panoply of personal computers, laptops, tablets, smart phones are now ubiquitous in our daily life of work and leisure. We are surrounded by technology that collects, transmits, manipulates and ultimately needs to understand reams of information of an order of magnitude which is hard to comprehend. In 2016, for instance, in every single minute Google translated over 69 million words, over 400 hours of video were uploaded to YouTube and more than 200,000 photos were shared on Facebook Messenger.

The need to understand this *big data*, as the mass (and sometimes mess) of data that arises in the modern world is called, comes up in all sorts of different contexts: from the biomedical sciences to finance, the internet, software and hardware development and security, and image processing, to name just a few.

Mathematics has long been called the “language of the universe”, underpinning the development of modern science and technology. There is every reason to believe that mathematics, too, will provide the language of data: both the data itself (the values of the qualitative or quantitative variables) and the information (the content and meaning) it contains.

"In fact, it is not the data itself that is so important, but rather the information contained within it," says Schönlieb. "Using fundamental techniques from the mathematical sciences, it is possible to understand the limitations of what can be found from the data, and whether this information can be found in the next few seconds, minutes, hours, or if we have to run an algorithm forever without ever providing an answer. We can also use maths and stats to understand how certain or uncertain we should be about conclusions we draw from data."

The CCIMI grew out of a donation from Cantab Capital Partners and is based at the Centre for Mathematical Sciences, as a collaboration between the Department of Applied Mathematics and Theoretical Physics and the Department of Pure Mathematics and Mathematical Statistics. The Institute hosts research activities on the development of theory and methodology for analysing, processing and understanding information in data. It already includes more than 30 affiliated faculty, a cohort of 6 PhD students per year, and is currently expanding its research group.

## Joining mathematical forces

The exciting thing about the mathematics of information is that it is nurtured by a broad variety of different mathematical areas.

Take *image denoising* – any electronic signal (such as sounds or images) can contain random fluctuations originating from the device producing them. For example, the image below on the left is a noisy image of cat. The main aim of denoising is the differentiation between random noise and the actual image contents. to create an image such as the one below on the right.

*A noisy image of a cat on the left, and the de-noised image on the right. (Photo courtesy of Matthias Ehrhardt)*

To do so, we first have to characterise both the noise and the image contents. Noise is often random and to describe it we need probability and statistics. Image contents, on the other hand, can be described most of the time by geometric structures of different scales and different types (a house, a chair, the silhouette of a person), and maybe some repetitive, less orderly patterns (textures – such as water, grass, hair), and colour information. As we formalise our description of the contents of the image we might encounter geometry, differential equations, and harmonic analysis.

Then, after we have characterised the noise and the image contents, their differentiation often boils down to an optimisation problem. These problems usually have a very large number of unknowns (around the number of pixels in your image). Solving such problems requires maximising some value within some constraints, and uses the mathematics of numerical analysis. (You can find out more about some of these techniques in the *Plus* magazine article *Restoring profanity*.)

There are many more examples where different mathematical areas pop up – statistics, analysis, numerics and optimisation, geometry, topology, physics, and many more. To see how diverse the areas of mathematics involved are, explore the projects showcased on the CCIMI website.

*There is more information available for those interested in working with the institute at the CCIMI website.*