The acceleration is here: how AI agents are already making real scientific discoveries
AI is no longer just a tool for writing emails or generating images. It’s now actively helping scientists discover new treatments for cancer, blindness, liver disease, and antibiotic resistance—and doing it faster and cheaper than humans alone ever could.
In the same week, two separate papers were published in Nature, one from Google and one from an independent team, showing AI agents autonomously proposing scientific hypotheses, ranking ideas, analyzing messy lab data, and guiding experiments. Together, they point to a new era of “AI co-scientists” that can accelerate discovery across biology and medicine.
From chatbots to AI co-scientists
Most people think of AI as a chatbot that answers questions or summarizes documents. These new systems are very different. They are multi-agent AI frameworks designed specifically for scientific discovery, where multiple specialized AI agents work together like a virtual research lab.
The first system, built by Google, is called Co-Scientist. The second, called Robin, is a closed-loop system that doesn’t just think about ideas—it also analyzes raw experimental data and uses it to refine its hypotheses.
These systems don’t replace human scientists. Instead, they act as ultra-fast, tireless collaborators that can read thousands of papers, generate new ideas, and help design and interpret experiments at a speed humans simply can’t match. If you’ve been following advances like new long-thinking GPT models, this is that same trend—but pointed directly at real-world science.
Inside Google’s Co-Scientist: an AI-powered virtual lab
Co-Scientist is not a single large model answering prompts. It’s an ecosystem of AI agents, each with a specific role, working together to generate and refine scientific ideas. You can think of it as a virtual research group.
The core agents and how they work together
Supervisor agent: This is the coordinator. A human scientist gives it a high-level goal—such as “find new treatments for this type of leukemia”—and the supervisor breaks that goal into tasks and routes them to the right agents. It doesn’t invent ideas itself; it keeps the whole system organized and on track.
Generation agent: This is the brainstormer. It searches the scientific literature, pulls in relevant data, and proposes initial hypotheses or solution ideas based on what it finds.
Reflection agent: This is the harsh critic. Its job is to attack the generation agent’s ideas—checking for factual errors, logical flaws, and whether the idea is actually new rather than a rephrasing of known work. This back-and-forth between generator and critic is what pushes the system toward stronger, more original hypotheses.
Proximity agent: This agent clusters similar ideas in a high-dimensional space. If the system keeps generating slightly different versions of the same concept, the proximity agent flags that redundancy so compute and attention can shift to genuinely new directions.
Evolution agent: This agent refines surviving ideas. It can merge two promising hypotheses, fill in missing logical steps, or tweak concepts based on the reflection agent’s critiques.
Ranking agent with ELO tournaments: This is one of the most innovative parts. Co-Scientist runs “tournaments of ideas” using an ELO rating system similar to chess or competitive games. Pairs of hypotheses are pitted against each other in simulated debates: each “argues” why it’s more plausible, novel, or testable than the other. A separate AI judge decides the winner, and ELO scores are updated. Over hundreds or thousands of these matchups, the strongest ideas rise to the top.
Does this actually beat human experts?
To test Co-Scientist, researchers gave it 15 extremely challenging unsolved biomedical problems written by PhD-level scientists. They also collected best-guess solutions from human experts and from other state-of-the-art AI models.
Independent expert judges then evaluated all the ideas in a blind test—they didn’t know which came from humans or which AI. Co-Scientist’s hypotheses were consistently rated higher in novelty, plausibility, and potential impact than both human experts and other models.
That means this isn’t just a hallucination machine. It’s generating ideas that domain experts, without knowing their origin, consider better than their own.
AI discovers new treatment paths for leukemia
Strong ideas are one thing. The real question is: do they work in the lab?
One of the most striking Co-Scientist case studies focused on acute myeloid leukemia (AML), a fast and aggressive blood cancer. Standard chemotherapy can often kill the bulk of cancer cells initially, but the disease frequently returns. That’s because of leukemia stem cells—dormant “root” cells that survive treatment and later cause relapse.
There are currently no great therapies that selectively target these stem cells. So researchers gave Co-Scientist a dataset of 2,300 FDA-approved drugs (used for other conditions) and asked: which of these could be repurposed to treat AML, especially by targeting those hard-to-kill stem cells?
Repurposing existing drugs for AML
Co-Scientist surfaced several promising candidates, including drugs like binimetinib, which is approved for skin cancer. When tested on leukemia cells in the lab, binimetinib showed very strong potency, with an extremely low IC50 (a standard measure of how much drug is needed to inhibit cell survival by half). Low IC50 means high potency—very little drug is needed to have a big effect.
This alone is a meaningful discovery: an existing cancer drug identified by AI as a powerful candidate against a different, difficult-to-treat cancer.
A completely unexpected leukemia target
The researchers then asked Co-Scientist for ideas that had no prior published link to leukemia or cancer at all—truly novel directions. After extensive internal debate and ranking, the system proposed a drug called Cur6.
Cur6 inhibits an enzyme (I1α) involved in managing cellular stress and clearing misfolded or damaged proteins. Co-Scientist reasoned that rapidly dividing cancer cells are under extreme internal stress and rely heavily on this stress-response pathway to survive. Normal cells, by contrast, are less stressed and less dependent on it.
The AI hypothesized that blocking this pathway with Cur6 would selectively push leukemia cells over the edge, while sparing healthy cells. When scientists tested this in the lab, Cur6 turned out to be 18 times more effective at killing leukemia stem cells than normal cells—exactly the type of selective effect you’d want in a therapy aimed at preventing relapse.
This was not a re-discovery of a known mechanism. It was a genuinely new therapeutic idea that human researchers hadn’t published before.
Finding powerful drug combinations
Co-Scientist didn’t stop at single drugs. It also explored combinations, which are notoriously hard to optimize because the number of possible multi-drug combos explodes into the thousands or millions.
For AML, the system proposed a three-drug combination: JQ1, Olaparib, and MSA-2. Lab tests confirmed that this trio worked synergistically—together they were much more effective at stopping the cancer than any single drug alone.
Because AI can synthesize vast amounts of literature and reason about interactions, it can narrow the search space to a small set of highly promising combinations, saving years of trial-and-error in the lab.
AI finds new uses for existing drugs in liver disease
The same Co-Scientist framework was also applied to liver fibrosis, a condition where chronic inflammation leads to excessive scar tissue and, eventually, liver failure.
Researchers asked the system to identify new epigenetic targets—molecular “switches” that control which genes are turned on or off in liver cells and drive fibrosis. Once you know which switches matter, you can look for drugs that flip them back toward a healthy state.
Co-Scientist generated and refined many hypotheses, then converged on new epigenetic targets and proposed drugs to hit them, including vorinostat. Vorinostat is already FDA-approved, but for a rare lymphoma, not liver disease.
When tested, vorinostat reduced liver scarring in lab models without being toxic to human liver cells. Again, AI uncovered a hidden second life for an existing drug, dramatically shortening the path from idea to potential therapy.
Cracking how antibiotic resistance spreads
Antimicrobial resistance (AMR) is one of the biggest global health threats of this century. Bacteria, fungi, and parasites evolve to resist our antibiotics, making once-treatable infections dangerous again.
We know that resistance genes often travel via mobile genetic elements—DNA “packages” that move between bacteria. But one mystery has been how these packages can sometimes jump so easily between very different bacterial species.
Researchers gave Co-Scientist this puzzle: here’s a strange genetic element, figure out how this gene transfer works. In just two days of autonomous literature review and reasoning, the AI proposed a top-ranked hypothesis: these mobile elements hijack diverse phage tails—virus-like structures that dock onto bacteria—to expand the range of species they can infect.
This mechanism neatly explains the rapid spread of resistance across species. Remarkably, the AI’s hypothesis matched the unpublished findings of an independent lab group that had spent months experimentally uncovering the same mechanism. Co-Scientist inferred the answer purely from reading and reasoning over existing literature.
Robin: a closed-loop AI system that runs the full scientific cycle
Co-Scientist excels at reading, reasoning, and proposing ideas. The second Nature paper takes the next step: an AI system that also analyzes raw experimental data and uses it to refine its hypotheses in real time.
This system is called Robin. It’s a multi-agent framework that forms a closed loop: literature review → hypothesis → experiment design → data analysis → refined hypothesis → next experiment. Human scientists still run the physical experiments, but Robin handles the cognitive heavy lifting around them.
The three key agents in Robin
Crow: This agent performs fast, accurate literature reviews. Given a disease or question, it scans hundreds of papers and summarizes how the disease works and which lab tests are relevant.
Falcon: Once a specific drug or treatment is in focus, Falcon does a deep dive. It compiles a detailed report on how the drug works, its safety profile, and potential risks.
Finch: This is the game-changer. Finch is the data analysis agent that takes raw lab outputs—often messy, noisy, and unstructured—and writes its own code to clean, analyze, and visualize the data.
How Finch avoids hallucinations and bias
Interpreting experimental data is hard and subjective. Two human scientists can look at the same dataset and disagree. To make Finch more reliable, Robin doesn’t rely on a single AI run.
Instead, when new data comes in, Robin launches eight independent Finch agents in parallel. Each one:
• Cleans the data in its own way
• Writes and runs its own analysis code (for example, in Python)
• Produces its own interpretation and plots
Robin then applies a consensus mechanism: only conclusions that at least half of the Finch agents agree on are accepted. This reduces the risk of a single hallucinated or biased interpretation shaping the outcome.
The results from Finch are then fed back to Crow to generate new hypotheses, and the loop continues. This is the scientific method—hypothesize, test, analyze, refine—automated and accelerated.
Robin discovers new treatments for age-related blindness
To prove Robin works end-to-end, researchers gave it a major unsolved problem: dry age-related macular degeneration (AMD), the leading cause of irreversible vision loss in the developed world. Current treatments are limited, and there’s a huge need for better options.
Round 1: finding a promising pathway
Crow scanned hundreds of papers and Robin hypothesized that a key process to target is RPE phagocytosis. The retinal pigment epithelium (RPE) is a layer of cells at the back of the eye whose job includes “eating” and clearing cellular waste. If this garbage disposal system fails, toxic waste builds up and vision is lost.
Robin’s first idea: find drugs that boost RPE phagocytosis. It proposed 30 existing, safe drugs that might enhance this process and even suggested how to test them.
Human scientists ran the experiment in the lab, measuring how each drug affected RPE phagocytosis, then fed the raw data back into Robin.
Round 2: analyzing data and uncovering a mechanism
Finch spun up eight parallel agents, wrote code, cleaned the data, and performed statistical analysis. The consensus: a compound called Y-27632 significantly boosted RPE phagocytosis. The cells were clearing more waste.
Robin then asked a deeper question: how is Y-27632 doing this? It suggested RNA sequencing to see which genes changed activity after treatment. The team ran the experiment and fed the massive RNA dataset back into Finch.
Finch generated a volcano plot—a graph where each dot is a gene, and the most dramatically changed genes “erupt” to the top. One gene, ABCA1, stood out as being strongly upregulated.
ABCA1 is known for pumping cholesterol and fats out of cells, and it interacts with APOE, a major genetic risk factor for AMD. That means Robin didn’t just find a drug that worked; it connected that drug to a deep genetic mechanism directly tied to the disease’s root cause.
Round 3: finding even better drugs
Armed with the ABCA1 insight, Robin went back to the literature to find drugs that might hit this pathway more effectively and safely. It proposed two new candidates: ripasudil and KL001.
Ripasudil: Robin found that ripasudil is already approved in Japan as an eye drop, meaning its safety in the eye is well understood. When tested on human cells with AMD-like conditions, ripasudil outperformed the original Y-27632: it boosted waste clearance more strongly and was less toxic.
KL001: This one is even more surprising. KL001 is a circadian clock modulator—it affects the cell’s internal timekeeping system. No one had previously proposed using a circadian drug to treat AMD.
Robin, tracing complex gene interaction networks, hypothesized that RPE phagocytosis is tied to the cell’s internal clock. If the “garbage collection” process is off-schedule, waste accumulates. By stabilizing the circadian rhythm with KL001, the AI predicted, you could restore proper cleaning.
Lab tests confirmed that KL001 enhanced the waste-clearing activity in human eye cells, pointing to a completely new treatment avenue for macular degeneration that hadn’t been explored before.
400 hours of work compressed into 2 hours
The Robin paper includes a time-on-task analysis that highlights just how dramatic this acceleration is.
To replicate what Robin did manually, a human scientist would need to:
• Read and synthesize 551 specialized papers
• Generate hypotheses from them
• Design and plan the experiments
• Write, debug, and run all the data analysis code
• Interpret the results and iterate
The estimated time for a human: around 400 hours of intense, focused work—almost half a year of full-time effort.
Robin did the same intellectual work in under 2 hours, including multiple rounds of analysis, at a compute cost of about $10.76.
When you combine this with the rapid progress in general-purpose models—like the latest long-context systems covered in our look at DeepSeek V4’s 1M-token context window—you can see where this is heading: AI that can read essentially all known literature on a topic and turn it into actionable, testable science in a single afternoon.
What this means for the future of science
These two Nature papers mark a turning point. AI is no longer just assisting with paperwork or automating simple tasks. It’s now:
• Proposing novel hypotheses that human experts hadn’t considered
• Ranking and refining ideas through structured debate and tournaments
• Identifying new uses for existing drugs across diseases
• Interpreting messy, real-world lab data with consensus mechanisms
• Guiding multi-round experimental programs in days instead of months
In just one week, we saw AI agents contribute to potential advances in:
• Cancer (acute myeloid leukemia and leukemia stem cells)
• Liver fibrosis
• Antimicrobial resistance
• Age-related macular degeneration and blindness
The pace of discovery is clearly accelerating. As these systems improve, we can expect an explosion of new hypotheses, faster validation cycles, and a growing list of AI-augmented breakthroughs across medicine, biology, and beyond.
The key takeaway: we’ve entered an era where AI isn’t just a tool for productivity—it’s becoming a core part of how we expand human knowledge itself.
Comments
No comments yet. Be the first to share your thoughts!