Two groundbreaking artificial intelligence systems are revolutionizing protein engineering, potentially transforming how pharmaceutical companies develop new medicines and diagnostic tools. The advances address one of biotechnology's most challenging problems: designing proteins that fold into precise three-dimensional shapes required for therapeutic function.
MapDiff Framework Advances Inverse Protein Folding
Researchers from the University of Sheffield and AstraZeneca have developed MapDiff, a machine learning framework that significantly outperforms existing state-of-the-art methods for inverse protein folding. Published in Nature Machine Intelligence, the study demonstrates how AI can more accurately predict amino acid sequences that will fold into desired protein structures.
"This work represents a significant step forward in using AI to design proteins with desired structures," said Professor Haiping Lu of the University of Sheffield, the study's corresponding author. "By learning how to generate amino acid sequences that are likely to fold into specific 3D structures, our method opens new possibilities for designing new therapeutic proteins, which can be used in various therapeutic applications."
Inverse protein folding is particularly challenging because small changes in protein sequences can cause unpredictable effects on the protein's structure. For medicines to work properly, proteins must fold into very specific 3D shapes. The MapDiff approach works like a guide that predicts the most important folds in the protein structure, making the design process more accurate.
Peizhen Bai, Senior Machine Learning Scientist at AstraZeneca who developed the AI during his PhD at Sheffield, explained the motivation: "I was motivated by the potential of AI to accelerate biological discovery. I'm proud that our method, MapDiff, helps design protein sequences that are more likely to fold into desired 3D structures — a key step towards advancing next-generation therapeutics."
AI Pipeline Generates Thousands of Ready-to-Use Proteins
Meanwhile, researchers from Chongqing University and Zhejiang University have created an AI tool that generated 7,245 new proteins entirely through computer design. Published in Frontiers of Computer Science, this system addresses the practical challenges of protein manufacturing and stability.
The team collected over 1,300 existing protein structures from a public database, with the AI generating five new versions of each. The system then screened these proteins for key features such as stability and target-binding ability, selecting only the best-performing candidates.
"Our AI pipeline lets us design high-performance proteins quickly and at low cost," said Professor Weiwei Xue. "This gives smaller labs and companies a chance to compete and innovate."
Enhanced Stability and Manufacturing Readiness
The Chinese team's approach specifically addresses manufacturing challenges that often plague protein-based therapeutics. More than 70% of the new proteins were predicted to dissolve cleanly in liquids such as water or buffer solutions, essential for use in laboratory tests, injectable drugs, or diagnostic test strips. Approximately 60% remained stable at high temperatures, beneficial during shipping, storage, or sterilization processes.
The proteins were based on 55 different structure types, including parts of antibodies already used in approved drugs. This variety ensures the proteins can target many kinds of disease markers. Nearly half were predicted to bind even more tightly than older designs, important for accuracy in diagnostic tests and efficacy in treatments.
Accelerating Drug Development Timelines
Both AI systems promise to dramatically reduce development timelines. The Chinese team's tool helps cut months of laboratory work down to weeks by predicting which proteins will maintain their shape when heated and dissolve properly in solutions, making them less likely to clog laboratory equipment or break down during storage.
The MapDiff framework complements other recent advances such as AlphaFold, which predicts a protein's 3D structure by starting with the protein fold and retrieving potential amino acid sequences. Together, these tools could accelerate the design of key proteins needed for vaccines, gene therapies, and other therapeutic modalities.
Industry Collaboration and Future Impact
The Sheffield-AstraZeneca collaboration builds on previous work that developed DrugBAN, an AI system that predicts whether candidate drugs will bind with their intended target proteins. That research became one of the most cited papers from Nature Machine Intelligence in 2023, demonstrating the growing impact of AI in pharmaceutical research.
For pharmaceutical and biotechnology companies, these advances represent faster development pipelines and reduced costs. For hospitals and laboratories, the technology could lead to more reliable and affordable diagnostic tests. The breakthrough enables researchers to explore biology faster and more efficiently than traditional methods allow.