‘We’ve seen an explosion in computing power’: Using AI, machine learning and data to unlock the mysteries of disease

By combining deep expertise in artificial intelligence (AI) and machine learning with scaled access to rich troves of genetic, genomic and health data, scientists are unlocking new discoveries about disease that could change the face of healthcare. 

Advanced technology is helping researchers predict how diseases may progress in certain patients, who is most likely to benefit from a particular treatment or vaccine, and which potential new medicines or vaccines are most likely make a real difference to patient lives. 

Using AI and machine learning, scientists can analyse vast quantities of biological data at speed to spot patterns that give insights into the inner workings of the human body and the factors that affect disease.  

“In the last 10-to-15 years, we’ve seen a revolution,” says Danielle Belgrave, Vice President of AI and ML at GSK – and a former scientist at Google DeepMind. 

“Not only have we seen an explosion in the scale of data that we can generate, but we’ve seen an explosion in computing power.”  

Combined with massive amounts of newly available data, this novel power is creating unprecedented opportunities for discovering next generation treatments for respiratory illnesses, cancer, Alzheimer’s, immune disorders, and a range of other devastating diseases.    

Increasing success 

For GSK scientists, integrating AI and machine learning techniques is already proving fruitful.  

“Machine learning has helped GSK predict future patient responses to a potential medicine in the clinic,” says Patrick Schwab, senior director of Machine Learning and AI at GSK.  

“It also has the potential to help predict who is at risk of developing a specific disease, allowing patients to get screened, diagnosed and treated, and potentially preventing severe long-term outcomes.”

In-house algorithms are being used to analyse genetic and genomic data that can identify which patients with lupus – an autoimmune disease that can cause organ damage – are least likely to respond to traditional treatments. This means they could avoid the potentially serious side effects of those traditional treatments and work with a healthcare professional to seek out suitable alternatives. 

Advanced technology allows GSK researchers to predict future responses to a potential therapy for chronic hepatitis B too – a viral infection that attacks the liver and affects more than 250 million people worldwide.  

Machine learning models are also used to help predict who is at risk of developing hepatitis B and assess who is likely to be a carrier without their knowledge. This could lead to more patients getting screened, diagnosed and treated, potentially preventing the serious long-term consequences of living with the disease without treatment. According to the World Health Organisation (WHO), only 10% of people living with hepatitis B are aware of their infection, and only 5% are on treatment, leaving them at risk of developing liver scarring, cancer, and even failure. 

Reimagining drug development 

Findings made by analysing mountains of data at speed allow researchers to narrow their focus on disease areas that present the biggest unmet need – where new medications or vaccines could have the largest impact for patients. 

As such, integrating AI and machine learning techniques into scientific development is helping to reduce research and development costs by cutting down the amount of time and resources previously needed to make similar progress. This frees up funding and talent to pursue novel medicines – a move Morgan Stanley believes could lead to a 15% increasing in the number of therapies that are approved for market. 

A key part of reimaging the traditionally linear drug development process at GSK involves integrating teams of biological scientists and technology experts to approach questions from new angles, as well as creating a single platform, from which researchers can access pooled data resources.  

Another is using AI and machine learning to create a reverse translation process that creates a feedback loop, also known as lab-in-the-loop. In this process, tech engineers collaborate with geneticists and biologists to develop pipelines using a method called active learning. This helps scientists make decisions about which experiments they should prioritise from a vast list of possible hypotheses by testing targets identified by the algorithms. The results are then used to determine additional research needs – for example, which patients are more likely to respond to investigational treatment in a clinical trial. 

“AI and ML is a powerful tool that has enhanced a lot of the technology that we use today,” Belgrave says.  

“It is exciting to see these tools having impact in the healthcare domain, allowing us to make novel scientific discoveries as well as understanding which patients are most likely to benefit from new medicines and vaccines.” 

The data revolution 

None of this would be possible, of course, without access to huge amounts of data. 

Biopharmas with long histories in disease research, like GSK, have access to their own unique archives of clinical data built up over decades. Many are also teaming up with genetic, genomic and health databanks to mine for information they can use to augment their findings. 

UK Biobank, a database which contains massive amounts of genetic and health data from half a million British volunteers, is one of the datasets that GSK is using. In November 2023, it announced the release of the world’s largest set of genome sequencing data – a project that is expected to drive the discovery of new treatments and cures that could transform patients’ lives. 

Together with the biobank’s existing records, including electronic health records, this genomic data will shed new light on the role genes play in causing an array of diseases. It could also be used to identify people who might be more susceptible to those diseases, potentially paving the way for targeted drug development and new precision treatments. 

“UK Biobank provides the global research community with the most detailed picture of human health that exists, giving scientists an unprecedented toolbox for discovery,” says UK Biobank’s Chief Scientist, Professor Naomi Allen. 

“Researchers can use these data to understand why some people develop certain diseases and others don’t – and to find better ways to diagnose disease early and develop new treatments at a scale that’s never been done before.” 

Separately, as part of an initiative called the Alliance for Genomic Discovery, GSK and several of its peers are funding the whole-genome sequencing of DNA from 250,000 patients. By relying more heavily on samples from underrepresented ancestries, the alliance aims to expand the diversity of genomic data and learn more about the underlying genetic causes of many diseases. 

Other ambitious data-gathering efforts are advancing rapidly as well.  

In the US, researchers have discovered more than 275 million previously unreported genetic variants thanks to a National Institutes of Health program known as ‘All of Us’. The platform plans to collect data from at least 1 million people in a bid to reveal new information that could positively impact human health. 

“In recent years, we’ve gone from working with data from thousands of individuals to working with data from millions of individuals – and now we’re looking at hundreds of millions of genetic variants and tens of thousands of human phenotypes,” Robert Scott, Vice President of Human Genetics and Genomics at GSK, concludes.