Wellcome Sanger Institute researchers develop machine learning tool to predict success of prime editing of human genome
Wellcome Sanger Institute researchers have developed a new tool to predict the chances of successfully inserting a gene-edited sequence of DNA into the genome of a cell, using a technique called prime editing.
This method is an evolution of CRISPR-Cas9 gene editing technology, and has great potential to treat genetic diseases, from cancer to cystic fibrosis, but so far the factors determining the success of edits have not been well understood.
In a new study published in Nature Biotechnology, Sanger Institute researchers explain how they introduced thousands of DNA sequences into the genome using prime editors and then trained a machine learning algorithm to help researchers design the best fix for a given genetic flaw.
First author Jonas Koeppel said: “The variables involved in successful prime edits of the genome are many, but we’re beginning to discover what factors improve the chances of success.
“Length of sequence is one of these factors, but it’s not as simple as the longer the sequence the more difficult it is to insert. We also found that one type of DNA repair prevented the insertion of short sequences, whereas another type of repair prevented the insertion of long sequences.”
CRISPR-Cas9 was developed in 2012 as the first easily programmable gene editing technology - a kind of ‘molecular scissors’ that enable researchers to cut DNA at any position in the genome so they can remove, add or alter sections of the DNA sequence.
It is used to research which genes are important for conditions from cancer to rare diseases,
and to develop treatments that solve or ‘turn off’ harmful mutations or genes.
Expanding on CRISPR-Cas9, base editors were an innovation referred to as ‘molecular pencils’ for their ability to substitute single bases of DNA.
Then in 2019, the latest gene editing tools were created, called prime editors - also dubbed ‘molecular word processors’ for their ability to perform search and replace operations directly on the genome with a high degree of precision.
Researchers hope the technologies can ultimately help correct harmful mutations in people’s genes.
So far more than 16,000 small deletion variants, in which a small number of DNA bases have been removed from the genome, have been causally linked to disease.
In cystic fibrosis, 70 per cent of cases are known to be caused by the deletion of just three DNA bases.
A further breakthrough emerged in 2022, when base edited T-cells were successfully used to treat a patient’s leukaemia, after chemotherapy and a bone marrow transplant had failed.
Sanger Institute researchers designed 3,604 DNA sequences of between one and 69 DNA bases in length for the new study, and inserted them into three different human cell lines, using different prime editor delivery systems in various DNA repair contexts.
They carried out genome sequencing on these cells after a week to see if the edits had been successful or not and assessed each sequence to determine common factors affecting the the success rate - or insertion efficiency.
The length of sequence proved to be a key factor, as was the type of DNA repair mechanism involved.
The researchers used machine learning to detect patterns that determined the insertion success and, after training on the existing data, used the algorithm on new data. It was able to predict insertion success successfully.
Juliane Weller, a first author of the study from the Wellcome Sanger Institute, said: “Put simply, several different combinations of three DNA letters can encode for the same amino acid in a protein. That’s why there are hundreds of ways to edit a gene to achieve the same outcome at the protein level.
“By feeding these potential gene edits into a machine learning algorithm, we have created a model to rank them on how likely they are to work. We hope this will remove much of the trial and error involved in prime editing and speed up progress considerably.”
The team now needs to make models for all known human genetic diseases to better understand if, and how, they can be fixed using prime editing.
The work will involve other research groups at the Sanger Institute and its collaborators.
Dr Leopold Parts, senior author of the study from the Wellcome Sanger Institute, said: “The potential of prime editing to improve human health is vast, but first we need to understand the easiest, most efficient and safest ways to make these edits. It’s all about understanding the rules of the game, which the data and tool resulting from this study will help us to do.”