In a groundbreaking advancement for genetic engineering and precision medicine, researchers have successfully used artificial intelligence to design synthetic DNA molecules that can control gene expression in healthy mammalian cells. The study, published in the journal Cell, represents the first reported instance of generative AI creating functional DNA regulatory sequences not found in nature.
A team at the Centre for Genomic Regulation (CRG) in Barcelona developed an AI system capable of designing custom DNA fragments that can activate or suppress genes in specific cell types with unprecedented precision. This achievement marks a significant milestone in the emerging field of generative biology.
AI-Generated DNA Sequences Demonstrate Precise Control
The researchers trained their AI model to predict which combinations of DNA nucleotides (A, T, C, G) would create regulatory elements with specific gene expression patterns in different cell types. As proof of concept, they instructed the AI to design synthetic DNA fragments that would activate a gene coding for a fluorescent protein in certain cells while leaving gene expression patterns unaltered in others.
When these approximately 250-letter DNA fragments were synthesized and delivered into mouse blood cells via viral vectors, they fused with the genome at random locations and functioned exactly as predicted.
"The potential applications are vast. It's like writing software but for biology, giving us new ways of giving instructions to a cell and guiding how they develop and behave with unprecedented accuracy," said Dr. Robert Frömel, first author of the study.
Creating Ultra-Selective Genetic Switches
Gene expression is controlled by regulatory elements like enhancers, tiny fragments of DNA that switch genes on or off. Traditional approaches to correcting faulty gene expression have been limited to using naturally occurring enhancers found in the genome.
The AI-generated enhancers can help engineer ultra-selective switches that nature has not yet invented. These synthetic regulatory elements can be designed with precise on/off patterns required in specific types of cells—a level of fine-tuning crucial for creating therapies that avoid unintended effects in healthy cells.
Dr. Lars Velten, corresponding author of the study, explained: "To create a language model for biology, you have to understand the language cells speak. We set out to decipher these grammar rules for enhancers so that we can create entirely new words and sentences."
Building the AI Through Extensive Experimentation
Developing the AI model required massive amounts of high-quality biological data, which the team generated through thousands of experiments with laboratory models of blood formation. Over five years, they synthesized more than 64,000 synthetic enhancers, each carefully designed to test different arrangements and strengths of binding sites for 38 different transcription factors—proteins involved in controlling gene expression.
Unlike previous studies that typically used cancer cell lines, the researchers worked with healthy cells to better represent human biology. This approach helped uncover subtle mechanisms that shape the immune system and blood cell production.
The team tracked how active each synthetic enhancer became across seven stages of blood-cell development and discovered that many enhancers activate genes in one type of cell while repressing genes in another. Most enhancers worked like volume dials, turning gene activity up or down, but certain combinations acted as on/off switches through what the scientists termed "negative synergy."
Implications for Precision Medicine and Gene Therapy
This technology could revolutionize gene therapy by enabling developers to boost or dampen gene activity only in specific cells or tissues that require adjustment. Such precision could make treatments more effective while reducing side effects.
While advances in generative biology have largely benefited protein design until now, helping scientists create new enzymes and antibodies, many human diseases stem from faulty gene expression that is cell-type specific. For these conditions, there might never be a perfect protein drug candidate, making this new approach particularly valuable.
The researchers note that their work has only scratched the surface. Both humans and mice have an estimated 1,600 transcription factors regulating their genomes, and the team has explored just a fraction of these in their initial study.
The research was funded by an ERC Starting Grant from the European Union and a grant from the Spanish National Agency for Research, with contributions from researchers at the Barcelona Collaboratorium, a joint initiative between the CRG and EMBL-Barcelona.