Skip to content
Chimera readability score 0.5091 out of 100, reading level.

Annie Vella did some research into how 158 professional software engineers used AI, her first question was:
Are AI tools shifting where engineers actually spend their time and effort? Because if they are, they’re implicitly shifting what skills we practice and, ultimately, the definition of the role itself.
She found that participants saw a shift from creation-oriented tasks to verification-oriented tasks, but it was a different form of verification than reviewing and testing.
In my thesis, I propose a name for it: supervisory engineering work - the effort required to direct AI, evaluate its output, and correct it when it’s wrong.
Many software folks think of inner and outer loops. The inner loop is writing code, testing, debugging. The outer loop is commit, review, CI/CD, deploy, observe.
What if supervisory engineering work lives in a new loop between these two loops? AI is increasingly automating the inner loop - the code generation, the build-test cycle, the debugging. But someone still has to direct that work, evaluate the output, and correct what’s wrong. That feels like a new loop, the middle loop, a layer where engineers supervise AI doing what they used to do by hand.
A potential issue with this research is that it finished in April 2025, before the latest batch of models greatly improved their software development capabilities. But my sense is that this improvement in models has only accelerated a shift to supervisory engineering. This shift is a traumatic change to what we do and the skills we need. It doesn’t mean “the end of programming”, rather a change of what it means to be programming.
A lot of software engineers right now are feeling genuine uncertainty about the future of their careers. What they trained to do, what they spent years upskilling in, is shifting - and in many ways, being commoditised. The narratives don’t help: either AI is coming for your job, or you should just “move upstream” into architecture and “higher value” work. Neither tells you what to actually do on Monday morning.
That’s why this matters. There is still plenty of engineering work in software engineering, even if it looks different from what most of us trained for. Supervisory engineering work and the middle loop are one way of describing what that different looks like, grounded in what engineers are actually reporting.
❄ ❄ ❄ ❄ ❄
Bassim Eledath lays out 8 levels of Agentic Engineering.
AI’s coding ability is outpacing our ability to wield it effectively. That’s why all the SWE-bench score maxxing isn’t syncing with the productivity metrics engineering leadership actually cares about. When Anthropic’s team ships a product like Cowork in 10 days and another team can’t move past a broken POC using the same models, the difference is that one team has closed the gap between capability and practice and the other hasn’t.
That gap doesn’t close overnight. It closes in levels. 8 of them.
His levels are:
- Tab Complete
- Agent IDE
- Context Engineering
- Compounding Engineering
- MCP & Skills
- Harness Engineering
- Background Agents
- Autonomous Agent Teams
Eight seems to be the number thou shalt have for levels. Earlier this year Steve Yegge proposed eight levels in Welcome to Gas Town. His levels were
- Zero or Near-Zero AI: maybe code completions, sometimes ask Chat questions
- Coding agent in IDE, permissions turned on. A narrow coding agent in a sidebar asks your permission to run tools.
- Agent in IDE, YOLO mode: Trust goes up. You turn off permissions, agent gets wider.
- In IDE, wide agent: Your agent gradually grows to fill the screen. Code is just for diffs.
- CLI, single agent. YOLO. Diffs scroll by. You may or may not look at them.
- CLI, multi-agent, YOLO. You regularly use 3 to 5 parallel instances. You are very fast.
- 10+ agents, hand-managed. You are starting to push the limits of hand-management.
- Building your own orchestrator. You are on the frontier, automating your workflow.
I’m sure neither of these Maturity Models is entirely accurate, but both resonate as reasonable frameworks to think about LLM usage, and in particular to highlight how people are using them differently
❄ ❄ ❄ ❄ ❄
Chad Fowler thinks we have to change our thinking of what our target is when generating code.
…in a world where code can be generated quickly and cheaply, the real constraint has shifted. The problem is no longer producing code. The problem is replacing it safely.
Regenerative software does not work if the unit of generation is an application. Regeneration only works if the unit of generation is a component that compiles into a system architecture
He outlines several architectural constraints that make it easier to replace components
- a small amount of communication patterns
- clear ownership of data (“exclusive mutation authority for each dataset to a single component”)
- clear evaluation surfaces, allowing behavior to be verified independently of implementation
- the right size of components (natural grain). That size is based on data ownership boundaries and evaluation surfaces
Dividing complex systems into networks of replaceable components has long been a goal of software architecture. So far, this is still important in the world of agentic engineering.
❄ ❄ ❄ ❄ ❄
Mike Masnick summarized troubling experiences of using AI detection systems on student writing. (He’s summarizing an article by Dadland Maye, which is behind a registration wall that I’m too lazy to form-fill.) Maye’s institution used tools to detect and flag AI writing.
We are teaching an entire generation of students that the goal of writing is to sound sufficiently unremarkable! Not to express an original thought, develop an argument, find your voice, or communicate with clarity and power—but to produce text bland enough that a statistical model doesn’t flag it.
The hopeful outcome was that Maye stopped requiring students to disclose their AI usage, which changed the conversation to a discussion about how to use the tools effectively.
Students approached me after class to ask how to use these tools well. One wanted to know how to prompt for research without copying output. Another asked how to tell when a summary drifted too far from its source. These conversations were pedagogical in nature. They became possible only after AI use stopped functioning as a disclosure problem and began functioning as a subject of instruction.
We need to teach people how to use AI tools to improve their work. The tricky thing with that aim is that they are so new, there aren’t yet any people experienced in how to use them properly. For one of the gray-haired brigade, it’s a fascinating time to watch our society react to the technology, but that’s little comfort for those trying to plot out their future.
❄ ❄ ❄ ❄ ❄
Ankit Jain thinks that not just should humans not write code, they also shouldn’t review it.
Humans already couldn’t keep up with code review when humans wrote code at human speed. Every engineering org I’ve talked to has the same dirty secret: PRs sitting for days, rubber-stamp approvals, and reviewers skimming 500-line diffs because they have their own work to do.
He posits a shift to layers of evaluation filters:
- Compare Multiple Options
- Deterministic Guardrails
- Humans define acceptance criteria
- Permission Systems as Architecture
- Adversarial Verification
Like Birgitta, I’m uneasy about the notion that “the code doesn’t matter”. I find that when I’m working at my best, the code clearly and precisely captures my intent. It’s easier for me to just change the code than to figure out how to explain to an chatbot what to change. Now, I’m not always at my best, and many changes are much more awkward than that. But I do think that a precise, understandable representation is a useful direction to aim to, and that agentic AI may be best used to help us get there.
In particular I don’t find his suggestion for #3 that natural language BDD specs are the way to go here. They are wordy and ambiguous. Tests are a valuable way to understand what a system does, and it may be that our agentic future has us thinking more about tests than implementation. But such tests need a different representation.
❄ ❄ ❄ ❄ ❄
The new servant leadership: we serve the agents by telling what to do 9/9/6

Facts Only

Annie Vella conducted research on how 158 professional software engineers use AI, concluding that their work is shifting from creation to verification tasks.
Vella proposes the term "supervisory engineering work" to describe the effort of directing, evaluating, and correcting AI output.
The research was completed in April 2025, before recent AI model improvements.
Bassim Eledath outlines eight levels of agentic engineering, ranging from basic tab completion to autonomous agent teams.
Steve Yegge proposes a separate eight-level framework for AI integration in software development, starting with minimal AI use and ending with custom orchestrator tools.
Chad Fowler argues that the primary challenge in AI-assisted coding is safely replacing components, not generating code.
Fowler emphasizes architectural constraints like clear data ownership and evaluation surfaces for replaceable components.
Mike Masnick summarizes an article by Dadland Maye about AI detection tools in education, noting they discourage originality.
Maye's institution stopped requiring AI usage disclosure, leading to pedagogical discussions on effective AI use.
Ankit Jain suggests AI could replace human code review, proposing layers of evaluation filters.
Jain's proposal includes deterministic guardrails, human-defined acceptance criteria, and adversarial verification.
The article references concerns about code quality and the role of human intent in coding.

Executive Summary

The article explores the evolving role of software engineers in the age of AI, highlighting a shift from creation-oriented tasks to supervisory engineering—where engineers direct, evaluate, and correct AI-generated code. Research by Annie Vella suggests this "middle loop" of supervision is emerging between traditional inner (coding) and outer (deployment) loops, though her study predates recent AI advancements. Bassim Eledath and Steve Yegge propose frameworks for AI integration, outlining progressive levels of agentic engineering, from basic code completion to autonomous agent teams. Chad Fowler argues that the challenge in AI-assisted development is no longer writing code but safely replacing components within well-architected systems. Meanwhile, Mike Masnick critiques AI detection tools in education, noting they incentivize bland writing over originality, while Ankit Jain suggests AI could automate code review, though this raises concerns about code quality and human oversight. The discussion reflects broader uncertainty about the future of software engineering, balancing AI's potential with the need for new skills and ethical considerations.

Full Take

The narrative presents a compelling case for the transformation of software engineering, framing AI as a tool that necessitates new skills rather than rendering engineers obsolete. The strongest version of this argument acknowledges the trauma of career uncertainty while offering a constructive path forward—supervisory engineering as a middle loop between creation and deployment. This steelman gives credit to the adaptability of the field and the potential for AI to elevate human work.
However, the discussion also reveals subtle patterns of ambiguity (ARC-0024) and forced binary choices (ARC-0043). The framing of "either AI is coming for your job or you must move upstream" oversimplifies the spectrum of adaptation, while the emphasis on "supervisory engineering" risks downplaying the creative and intentional aspects of coding that many engineers value. The critique of AI detection tools in education highlights a broader tension: the risk of prioritizing compliance over creativity, a pattern echoing historical shifts in labor where efficiency metrics eclipse human judgment.
Root causes include the assumption that AI's role is inevitably expansive and that human labor must adapt to its capabilities. This mirrors past industrial revolutions, where technology redefined roles but also created new hierarchies of skill. The implications for human agency are profound—while AI may handle repetitive tasks, the burden of oversight and ethical decision-making falls on engineers, potentially increasing cognitive load rather than reducing it.
Bridge questions: How might we preserve the craft of coding while leveraging AI's strengths? What metrics should define "effective" AI collaboration in engineering? And crucially, who decides the boundaries of human vs. AI responsibility in software development?
Counterstrike scan: A coordinated influence campaign might exploit the uncertainty around AI's role to push narratives of inevitability ("adapt or perish") or false equivalence ("AI is just another tool"). The actual content, however, presents a nuanced discussion of challenges and opportunities, avoiding overt manipulation. It aligns more with constructive debate than a playbook for control.
Patterns detected: ARC-0024 Ambiguity, ARC-0043 Motte-and-Bailey

Sentinel — Human

Confidence

The text exhibits strong human characteristics, including personal voice, stylistic idiosyncrasies, and varied sentence structure, with no significant indicators of synthetic generation.

Signals Detected
low severity: Varied sentence length and structure, with idiosyncratic phrasing (e.g., 'Eight seems to be the number thou shalt have for levels') and personal voice (e.g., 'I’m sure neither of these Maturity Models is entirely accurate').
low severity: Strong personal voice and stylistic fingerprint (e.g., 'That’s why this matters. There is still plenty of engineering work...'), with digressions and passionate emphasis.
low severity: No evidence of template patterns or verbatim talking points; references to specific individuals (Annie Vella, Bassim Eledath) with unique frameworks.
low severity: No suspicious claims or overly convenient attributions; references to verifiable sources (e.g., Steve Yegge’s 'Welcome to Gas Town').
Human Indicators
Idiosyncratic humor and phrasing (e.g., 'Eight seems to be the number thou shalt have for levels').
Personal reflections and uncertainty (e.g., 'I’m sure neither of these Maturity Models is entirely accurate').
Diverse stylistic choices, including informal asides and direct engagement with the reader.