A research collaboration led by Dr Sergii Strelchuk from the Department of Applied Mathematics and Theoretical Physics (DAMTP) is one of 12 projects worldwide to be selected for the Wellcome Leap Quantum for Bio (Q4Bio) Supported Challenge Program. The project has been awarded up to $3.5 million in funding to explore the potential of quantum computing for improvements in human health.
The Q4Bio Supported Challenge Program focuses on identifying and developing new applications and quantum computing algorithms for health and biomedicine that could benefit from quantum hardware advances expected to emerge in the next 3-5 years.
The research collaboration's winning project focuses on using quantum computing in one of the most exciting new areas in biomedical science: pangenomics. The team aims to develop quantum computing algorithms with the potential to speed up the production and analysis of pangenomes – new representations of DNA sequences that capture population diversity.
Pangenomics - revolutionising future science through diversity
Since the initial sequencing of the human genome over 20 years ago, genomics has revolutionised science and medicine. Our genetic code can provide insights into our health, help to diagnose disease or guide medical treatments. However, the reference human genome sequence, which most subsequently sequenced human DNA is compared to, is based on data from only a few people, and doesn't represent human diversity. Less than one per cent of the 6.4 billion letters of DNA code differs from one human to the next, but those genetic differences are what make each of us unique.
We've only just scratched the surface of both quantum computing and pangenomics, so to bring these two worlds together is incredibly exciting."David Holland
Scientists have been working to address this problem for over a decade, and in 2023 the first human pangenome reference was produced. A pangenome is a collection of many different genome sequences that capture the genetic diversity in a population.
This project brings together a world-leading interdisciplinary team including researchers from Cambridge and the University of Warwick, the Wellcome Sanger Institute, and the EMBL-EBI, with skills across quantum computation, genomics, genomic data, and advanced algorithms. Their aim is to tackle one of the most challenging computational problems in genomic science: building, augmenting and analysing pangenomic datasets for large population samples.
The potential benefits of this work are huge. Comparing a specific human genome against the human pangenome (instead of the existing human reference genome) gives better insights into its unique composition. This could open the door to advances in personalised medicine, for example developing tailored treatment options to improve ways of tackling cancers. The health applications go beyond human genomes, as pangenomes could potentially be produced for all species, including pathogens such as SARS-CoV-2. So pangenomics could help track and alert us to the evolution of drug-resistant bacteria, or the evolution of new viruses that might pose a threat of pandemics.
Harnessing the power of quantum computing
Quantum computing is based on the counter-intuitive reality of quantum physics. It exploits the properties of particles in quantum states - particularly superposition and quantum entanglement. Classical computing stores information as bits which are binary - either 0 or 1. However, a quantum computer works with particles that can be in a superposition of different states simultaneously. Rather than bits, information in a quantum computer is represented by qubits (quantum bits), which can take on the value 0, or 1, or both 0 and 1 at the same time. The potential power of quantum computing further increases enormously because particles can be entangled, so the system can also include the quantum correlations between different qubits. As the number of qubits increases the correlations grow exponentially - for n qubits there are 2n correlations. In theory, quantum computers could therefore outperform even the fastest supercomputers. (You can read an introduction to quantum computing in this article from the Faculty’s Plus outreach project, written with input from DAMTP’s Professor Richard Josza.)
However, there is a hitch. Quantum computers are inherently sensitive to noise and decoherence, so scaling them up presents an immense technological challenge. While there have been exciting proof of concept experiments and demonstrations, today’s quantum computers remain limited in size and computational power, which limits their practical application. But one area where expected advances in quantum computer hardware could potentially deliver advantages in the relatively near-term future is in human health applications requiring large-scale biological data.
Pangenomics demands high levels of computational power where quantum computing could offer real benefits. While the existing human reference genome structure is linear, pangenome data can be represented and analysed as a network, called a sequence graph, that stores the shared structure of genetic relationships between the reference genomes. Comparing individual genomes to the pangenome then involves matching sequences to map a route through the graph – a bit like finding the best route on the Tube map. The team hopes to develop quantum computing approaches with the potential to speed up both the key processes of mapping data to graph nodes, and finding good routes through the graph.
Dr Sergii Strelchuk, Royal Society University Research Fellow in DAMTP, and Associate Professor and Co-director of Warwick Quantum Research Centre in the Department of Computer Science, University of Warwick, is the Principal Investigator of the project. "The structure of many challenging problems in computational genomics and pangenomics in particular make them suitable candidates for speedups promised by quantum computing," he explains. "We are on a thrilling journey to develop and deploy quantum algorithms tailored to genomic data to gain new insights which are unattainable using classical algorithms."
Next steps towards a giant leap
The Wellcome Leap Q4Bio Challenge is based on the premise that the early days of any new computational method will advance and benefit most from the co-development of applications, software, and hardware – allowing early optimisations with not-yet-generalisable, early systems. Building on state of the art computational genomics methods, the team aim to develop, simulate and then implement new quantum algorithms, using real data. The algorithms and methods will be tested and refined in existing, powerful High Performance Compute (HPC) environments initially, which will be used as simulations of the expected quantum computing hardware. The team plan to test algorithms first using small stretches of DNA sequence, working up to processing relatively small genome sequences like SARS-CoV-2, before moving to the much larger human genome.
Excitingly, this project sits right at the frontiers of new research in both biomedical science and quantum computing. "On the one hand, we're starting from scratch because we don't even know yet how to represent a pangenome in a quantum computing environment. If you compare it to the first moon landings, this project is the equivalent of designing a rocket and training the astronauts," says David Yuan, Project Lead at EMBL-EBI. "On the other hand, we've got solid foundations, building on decades of systematically annotated genomic data generated by researchers worldwide and made available by EMBL-EBI. The fact that we're using this knowledge to develop the next generation of tools for the life sciences is a testament to the importance of open data and collaborative science.”
An initial human pangenome collection was released in May 2023, and is an ongoing work in progress. As institutions around the world actively collaborate on building this reference tool to capture our rich human diversity, the Q4Bio challenge project will help explore how quantum approaches could take pangenomics to the next level of impact.
"We've only just scratched the surface of both quantum computing and pangenomics," says David Holland, Principal Systems Administrator at the Wellcome Sanger Institute, who is working to create the High Performance Compute environment to simulate a quantum computer. "So to bring these two worlds together is incredibly exciting. We don't know exactly what's coming, but we see great opportunities for major new advances. We are doing things today that we hope will make tomorrow better."
You can read more about the work on developing the human pangenome in this EMBL-EBI news announcement, and discover more about the Wellcome Leap Q4Bio programme aims and background, including the quantum computing context, here.