Do We Still Need Scientists?

The question sounds provocative. It shouldn’t anymore — it’s being asked seriously, in scientific institutions, in funding agencies, in PhD programs around the world.

The evidence prompting it is real. Sakana AI’s “AI Scientist” system autonomously generates research ideas, writes code, runs experiments, and produces complete manuscripts — its second version achieved peer review acceptance scores high enough to cross the threshold for publication, the first fully AI-generated paper to do so. In the summer of 2025, theoretical physicist Alex Lupsasca gave GPT-5 a problem he himself had spent years solving — finding new symmetries in the equations governing a black hole’s event horizon. The model, working from training data gathered nine months before Lupsasca’s own paper was published, independently arrived at the same result by a different route. “The world has changed in some profound way,” Lupsasca wrote afterward — then moved his family to San Francisco to join OpenAI’s science team. Around the same time, mathematician Ernest Ryu proved a long-standing conjecture in optimization theory through 12 hours of back-and-forth with the same model. The Department of Energy has commissioned what it describes as the world’s largest autonomous-capable science system for microbial experimentation, running and redirecting experiments around the clock without human intervention. OpenAI’s Deep Research synthesizes hundreds of papers into cited reports in under an hour, automating the literature review that once consumed weeks of a PhD student’s life.

When a system can independently rediscover results that took a human scientist years, the question stops being whether AI can do science. It clearly can. The question becomes: what is the scientist for now? Not as a provocation, but as a genuine design problem. If AI can generate hypotheses, run experiments, synthesize literature, and produce manuscripts — what is the nature of the relationship between human researchers and these systems? Is the scientist a supervisor, a curator, a collaborator, a prompter? Or is the role simply dissolving? This post argues that framing it as replacement versus survival misses the real question — which is about the interface between human and AI cognition, and what that interface needs to do. It turns out the brain, which solved a surprisingly similar problem 200 million years ago, has something precise to say about it.


A 200-Million-Year-Old Precedent

The human brain did not arrive fully formed. It was built in layers, across hundreds of millions of years, with each new layer growing over older structures rather than replacing them. The limbic system — a circuit encompassing the amygdala, hippocampus, hypothalamus, and related structures — emerged prominently in early mammals around 200–250 million years ago. It was the cognitive center of its era: fast emotional evaluation, threat detection, memory of place and experience, and the drives of hunger, fear, and attachment.

Then the neocortex arrived. In primates, and dramatically in humans, a six-layered sheet of neural tissue expanded to dwarf everything beneath it — enabling abstraction, language, long-term planning, and reasoning that could operate across time in ways the limbic system could not.

Here is the critical point: the neocortex did not replace the limbic system. It grew around it, remained structurally coupled to it through dense bidirectional connections, and became deeply dependent on it in ways that took decades of neuroscience to fully appreciate.

Antonio Damasio’s somatic marker hypothesis made this vivid. Patients with damage to the ventromedial prefrontal cortex — severing the interface between neocortex and limbic system — did not become more rational. They became incapable of decision-making. Without emotional signals from the older system, the newer system could not determine what was worth optimizing for. The limbic system was not noise the cortex had to overcome. It was load-bearing data.


A New Layer Is Being Added

The analogy to our current moment is not subtle.

AI systems now perform certain cognitive tasks — pattern recognition, information synthesis, logical inference, literature search, hypothesis generation — at a level that exceeds individual human capacity in meaningful ways. They are, in some functional sense, a new layer of cognitive capability being added on top of existing human cognitive systems.

And the same temptation exists: to imagine this new layer as a replacement, or conversely, to resist it as a threat to everything the older layer does. Both instincts misread the neuroscience.

The limbic system’s role did not disappear when the neocortex expanded. It was recontextualized. It remained the source of motivational ground — what matters, what carries stakes, what connects to actual life. The neocortex extended reach and abstraction, but it was never self-grounding. It required the older system to tell it what was worth thinking about.

If AI systems represent a new cognitive layer, humans retain something analogous to that limbic function: embodied experience, genuine stakes in outcomes, moral intuition, the felt sense of what matters. These are not things that can be derived from first principles by any reasoning system. They are not inferior to abstract reasoning — they are its necessary ground.

But the lesson the brain offers is not just about the two systems. It is primarily about the interface between them.


The Anterior Cingulate Cortex

Sitting at the anatomical midline of the brain, wrapping around the corpus callosum, is a region called the anterior cingulate cortex — the ACC. It is positioned precisely at the boundary between the limbic system below and the prefrontal cortex above. It is one of the most metabolically active regions in the brain, and one of the most evolutionarily significant.

Its function is not to side with either system. Its function is to hold the tension between them honestly.

More specifically, the ACC performs several computational operations that no simple connection between regions can perform:

It detects discrepancies between what one system expects and what another is signaling — flagging when emotional evaluation and reasoned evaluation are pointing in different directions, and holding that conflict open rather than resolving it prematurely in favor of either.

It tracks prediction errors over time, learning which kinds of disagreements are meaningful and adjusting its sensitivity accordingly. It is not just a real-time monitor; it learns from the history of where conflicts resolved well or badly.

It maintains sustained attention on difficult, slow-resolving problems, resisting the cognitive pull toward premature closure that both systems are prone to in different ways.

It projects back to both systems it mediates — not merely relaying signals, but actively reshaping processing on both sides based on what it detects.

The ACC is neither fully limbic nor fully cortical. Its ambiguity is structural, not accidental. A region that belonged entirely to one system could not perform the integration function that belongs to neither. Its job requires that it resist classification.


What the Interface Actually Needs to Do

A chat interface between a human and an AI is not an ACC. It transmits. The ACC computes.

The difference matters enormously in practice. A passive interface lets the AI’s fluency carry outputs past the human’s critical attention. It lets confident-sounding errors go undetected. It allows one layer to dominate the other without registering that a meaningful conflict exists. It produces the appearance of integration while the actual signals from each system never genuinely meet.

An ACC-like interface between human and AI would need to do things that current tools largely do not:

It would model both signals simultaneously — maintaining a persistent representation of the human’s values, intuitions, and prior positions against which AI outputs are continuously compared, surfacing divergences rather than smoothing them.

It would correct for uncertainty asymmetry — AI systems produce fluent, confident-sounding output regardless of actual reliability. An honest interface would track calibration externally, flagging domains of unreliability that the AI’s own outputs will not flag.

It would slow down at high-stakes junctions, structurally resisting fast closure when a question carries genuine stakes rather than just complexity.

It would maintain memory across time — tracking where the collaboration has succeeded and failed, where human intuition overrode AI reasoning and turned out to be right, and using that history to dynamically weight each signal.

None of this exists as a fully realized technology. What exists instead is a practice — and a role.


The Role This Creates

I work as an AI Advisor for scientific centers, which in practice means sitting between two systems that do not naturally speak to each other: the AI capabilities being developed and deployed on one side, and the scientific community with its existing practices, intuitions, and legitimate concerns on the other.

The job, as it has emerged, is not to advocate for AI adoption or to defend against it. It is to notice friction — and treat it as information rather than as a problem to eliminate. It is to carry signals from the human side back to how AI is being deployed, and to carry an honest account of what AI actually does and doesn’t do back to the people working with it. It is to hold conflicts open long enough for something genuinely integrative to emerge, rather than resolving them prematurely in the direction of either enthusiasm or resistance.


What the Analogy Ultimately Suggests

The brain took hundreds of millions of years to get the interface between its cognitive layers approximately right — and it still fails dramatically under stress, trauma, and pathology. We are attempting to build an analogous interface between human and AI cognition in something closer to a decade.

That is either a testament to the power of intentional design over blind evolution, or a reason for humility about how well it will go without serious, sustained attention to the problem.

What the neuroscience makes clear is that the critical variable is neither the capability of the new layer nor the resistance of the old one. It is the architecture of the relationship between them. A neocortex that ignores the limbic system produces a system that can reason fluently about things that don’t matter. A limbic system that cannot interface with the neocortex produces a system that feels everything and can plan nothing.

The question we are actually navigating — in scientific institutions, in companies, in the broader culture — is not whether AI is powerful enough to be useful or dangerous enough to be feared. It is whether we are building the interface carefully enough that the integration produces something genuinely better than either layer alone.

The brain suggests this is possible.


Tomer is AI Advisor at the Edmond and Lily Safra Center for Brain Sciences (ELSC) at the Hebrew University of Jerusalem. This post was developed in conversation with Claude Sonnet 4.6.