For decades, scientists have struggled to understand the purpose of the 98% of human DNA that doesn't directly code for proteins—often called genomic "dark matter." On June 25, 2025, Google DeepMind unveiled a potential solution: AlphaGenome, an artificial intelligence system designed to interpret this mysterious non-coding DNA.
Unlike previous models that could only analyze short DNA segments or lacked single-base precision, AlphaGenome can process sequences up to one million letters long while maintaining nucleotide-level resolution. This technical breakthrough allows researchers to examine how distant regulatory elements influence gene activity—a critical factor in understanding disease mechanisms.
"This is one of the most fundamental problems not just in biology—in all of science," said Pushmeet Kohli, DeepMind's head of AI for science. The model predicts thousands of molecular properties, including where genes begin and end in different tissues, how RNA is spliced, and which proteins bind to specific DNA regions.
In benchmark tests, AlphaGenome outperformed specialized tools on 22 of 24 sequence prediction tasks and matched or exceeded others in 24 of 26 variant-effect evaluations. When analyzing mutations found in leukemia patients, the model accurately predicted how non-coding variants activated the cancer-related TAL1 gene by creating a new binding site for the MYB protein—replicating a known disease mechanism previously confirmed only through laboratory studies.
"For the first time, we have a single model that unifies long-range context, base-level precision, and state-of-the-art performance across a whole spectrum of genomic tasks," said Dr. Caleb Lareau of Memorial Sloan Kettering Cancer Center, who had early access to the system.
While powerful, AlphaGenome has limitations. It struggles with extremely distant regulatory elements (over 100,000 base pairs away) and cannot predict personal health outcomes or traits. DeepMind is making the model available through an API for non-commercial research, with plans for a full release in the future. Researchers expect it will accelerate disease studies by allowing virtual experiments that previously required extensive laboratory work.