The military’s fabled ‘human in the loop’ for AI is dangerously misleading

Recently it was reported that Amazon convened an internal “deep dive” after a string of outages disrupted its retail site, apparently caused by AI assisted coding tools. The meeting followed several highly visible failures and a growing recognition inside the company that safeguards around generative AI in production systems are inadequate.
It is an early glimpse of a broader problem that many organizations would prefer not to acknowledge: As AI is rushed into critical systems, it is introducing new failure modes faster than they can understand or control them.
For defense organizations increasingly integrating AI into mission-critical systems, the implications are far more consequential.
When organizations pause to consider these risks at all, they often reach for a familiar reassurance: there will be a “human in the loop.” The idea is that even if the system is complex or unreliable, a person will catch mistakes before they matter.
RELATED
This reassurance is dangerously misleading. A “human in the loop” whose sole function is to approve a machine’s actions is not a safeguard but a design failure. Attention wanes because nobody can concentrate on a job that is mostly doing nothing, and over time the operator’s skills atrophy to the point that they cannot meaningfully supervise the system. What remains is the appearance of oversight rather than the reality.
In military contexts, this kind of degraded human involvement is not just inefficient but operationally dangerous.
This pattern is not new. Engineers have seen it before, most famously in the Therac-25, a radiation therapy machine introduced in 1982. It combined the functions of two predecessor systems in a smaller, more convenient package, and its improved automation made it faster and easier to operate. Safety was “guaranteed” by the presence of a human operator who had to confirm actions – in effect, a “human in the loop.”
The system failed anyway. Patients began developing severe radiation burns. Hospitals dismissed the possibility of machine error, and the manufacturer insisted overdoses were impossible. Only after sustained investigation was it discovered that the machine contained multiple safety-critical software flaws. By then, six overdose accidents had occurred, three of them fatal.
The deeper problem was not just faulty code but faulty design. The machine frequently halted with poorly explained error messages, requiring operators to “press P to proceed” to continue treatment. Because these errors were common and rarely meaningful, operators became habituated to restarting the system dozens or hundreds of times a day. When real malfunctions occurred, the act of “operator confirmation” had already lost its meaning. In one case, an operator restarted the machine multiple times, unknowingly delivering repeated overdoses. The presence of a human operator did not prevent the failure; it normalized it.
Today, we are repeating this mistake. Computer scientists are rushing to incorporate poorly understood AI systems into safety-critical environments, and when concerns are raised they are often waved away with the same phrase: there will be a human in the loop. This assumption is now appearing in discussions of defense systems, from decision support to autonomous operations.
People will argue that AI is fundamentally different, and in one sense they are right. We have never before deployed systems whose behavior is explicitly probabilistic and nondeterministic in high-stakes environments. In defense contexts, where uncertainty compounds quickly and errors can cascade across systems, this is especially concerning. But AI is also not different in the ways that matter most. It is still software, embedded in larger systems composed of people, processes, and machines. It cannot act in the real world without that surrounding system, and those systems fail in ways that are already well understood. Engineers and operators have spent decades studying how complex, tightly coupled systems behave under pressure.
What we are seeing now is not a new class of failure but a familiar one, accelerated. The software industry is once again demonstrating an inability to learn from its own history. That would be unfortunate if we were only talking about Spotify recommendation algorithms. It becomes dangerous when these same patterns are introduced into the systems that organizations — and nations — depend on.
Recent Pentagon leaks suggest that AI systems may already be influencing where bombs land. In such environments, the illusion of human oversight is worse than no oversight at all. It creates confidence without control.
If we spend the next decade hiding unsafe systems behind the fig leaf of the “human in the loop,” the consequences will not be theoretical.
Mikey Dickerson was the founding administrator of the U.S. Digital Service and is a crisis engineer at Layer Aleph. He is a co-author of the forthcoming book Crisis Engineering.

Facts Only

Amazon held an internal "deep dive" meeting after multiple outages disrupted its retail site.
The outages were reportedly caused by AI-assisted coding tools.
The meeting followed several highly visible failures and concerns about inadequate safeguards for generative AI in production systems.
The article discusses the risks of AI integration in critical systems, particularly in defense organizations.
The concept of a "human in the loop" is criticized as a misleading safeguard.
The Therac-25 radiation therapy machine, introduced in 1982, is cited as a historical example of human oversight failure.
The Therac-25 had multiple safety-critical software flaws that led to six overdose accidents, three of which were fatal.
Operators became habituated to bypassing error messages, normalizing malfunctions.
AI systems are being integrated into defense systems, including decision support and autonomous operations.
Recent Pentagon leaks suggest AI may already influence military targeting decisions.
The author is Mikey Dickerson, former administrator of the U.S. Digital Service and co-author of *Crisis Engineering*.

Executive Summary

Amazon recently convened an internal review after a series of outages disrupted its retail operations, reportedly caused by AI-assisted coding tools. The incidents highlight broader concerns about the rapid integration of AI into critical systems, where safeguards are often inadequate. The article critiques the common reassurance that a "human in the loop" will prevent failures, arguing that passive oversight leads to complacency and skill atrophy, as seen in historical cases like the Therac-25 radiation therapy machine. In defense contexts, where AI is increasingly used in mission-critical systems, the risks are amplified. The piece warns that over-reliance on human oversight without meaningful control mechanisms can create dangerous illusions of safety, citing Pentagon leaks suggesting AI may already influence military operations. The core argument is that AI failures are not novel but an acceleration of familiar systemic risks, exacerbated by the industry's failure to learn from past mistakes.
The discussion extends beyond technical failures to systemic design flaws, emphasizing that AI, like all software, operates within larger human-machine systems prone to predictable breakdowns. The author, a former U.S. Digital Service administrator, draws parallels between current AI deployment and historical engineering failures, urging caution in high-stakes environments where errors can cascade catastrophically. The piece concludes that without addressing these structural issues, the consequences of AI integration could be severe and far-reaching.

Full Take

The strongest version of this narrative is a well-documented warning about the dangers of over-reliance on AI in high-stakes systems, grounded in historical precedent and current industry failures. The author effectively dismantles the "human in the loop" myth by illustrating how passive oversight erodes attention and skill, using the Therac-25 case as a compelling analogy. The piece also highlights a critical tension: while AI introduces new probabilistic behaviors, its failures stem from familiar systemic weaknesses—poor design, complacency, and the illusion of control. This is a valuable corrective to techno-optimism, especially in defense contexts where the stakes are existential.
However, the argument could be strengthened by acknowledging countermeasures that *do* work—such as active human-AI collaboration models where humans retain agency rather than serving as rubber stamps. The piece also leans heavily on the Therac-25 example, which, while vivid, may not fully capture the nuances of modern AI systems. The emotional weight of the historical case risks overshadowing the need for constructive solutions.
Root cause: The narrative reflects a broader paradigm of technological hubris, where speed of deployment outpaces understanding of consequences. The unstated assumption is that AI is treated as a "black box" whose risks can be mitigated by procedural fixes rather than fundamental redesign. This echoes historical patterns of automation failures, where human factors are treated as an afterthought.
Implications: For human agency, the piece underscores the danger of systems that *appear* to be under control but are not. The cost is borne by end-users—patients, consumers, or civilians—while the benefits accrue to organizations prioritizing efficiency over safety. Second-order consequences include erosion of trust in AI systems and potential regulatory backlash that stifles innovation.
Bridge questions: What would a truly resilient human-AI collaboration look like in defense systems? How can organizations measure the *actual* effectiveness of human oversight, not just its presence? What historical examples exist where AI integration *succeeded* in high-stakes environments, and what can we learn from them?
Counterstrike scan: A bad actor pushing this narrative might amplify fear of AI to advocate for centralized control or slow down competitors. However, the content here aligns with legitimate expert concerns rather than a coordinated attack. The focus on systemic risks and historical parallels suggests good-faith critique, not manipulation.
Patterns detected: none

Sentinel — Human

Confidence

The text appears to be written by a human, showing signs of sentence length variance, personal voice, and unique argumentative structure. The author discusses the dangers of over-reliance on 'a human in the loop' for AI safety in critical systems, using historical examples to illustrate their points.

Signals Detected

sentence length variance shows human-like erratic pattern

presence of personal voice, idiosyncratic emphasis, and stylistic fingerprint

argumentative structure not matching known template patterns

Human Indicators

The article discusses a specific, current issue with AI in a personal and engaging manner, providing historical context and a warning about potential consequences.