The Disappearing Middle: How AI Coding Tools Are Breaking Software Apprenticeship

Today marks roughly one year since I started taking agentic programming seriously in my day-to-day work and personal projects. By agentic programming I mean the general act of delegating end-to-end coding tasks to an AI.

A year ago, I was deeply skeptical. Today, I’ve found specific contexts where agentic programming genuinely accelerates my work… with caveats.

Me
#

Since this blog post is going to have a lot of my opinions, I thought that it’s probably worth giving a bit of context about me. I’m a Senior Staff Software Engineer at The Trade Desk, working on the Ventura TV OS project, where I’m the Technical Lead for the Clients team. My day-to-day is filled with the usual things for a Staff+ engineer: coding, reviewing, writing docs, meetings and mentoring. Outside of work projects, I also work on and maintain a bunch of open source projects.

A balanced view on AI Programming
#

CodeRabbit’s analysis of 470 open-source PRs found that AI-generated changes introduced ~1.7× more issues on average than comparable human-written PRs. More troubling is that these issues skew more towards major + critical issues. While the sample size and selection criteria is very narrow (see the study for their methodology), the direction of the effect aligns with other studies.

The type of issues with the largest increase were related to logic, with a 125% increase (2.25x) in probability. The way you would typically guard against these kind of errors is by writing lots of tests, but if no-one is reviewing the code, who knows what is being tested?

Security-focused studies paint a similar picture. Academic user studies from Stanford show that developers using AI assistants were “more likely to write insecure code and exhibit overconfidence.” Industry security evaluations, such as Veracode’s large-scale testing of AI-generated code samples, found that a substantial percentage of outputs contain at least one security vulnerability.

The takeaway for me is not that AI is “bad at coding,” but that it is systematically bad at knowing when it is wrong. That weakness becomes most dangerous in workflows where humans are removed from the implementation loop and positioned only as passive consumers of the output.

None of these studies prove that agentic programming is inherently unsafe. What they do consistently show is that too much trust combined with reduced scrutiny leads to worse outcomes. Agentic workflows amplify both factors unless guardrails are deliberately factored in.

AI Programming Personas
#

While writing this blog post, I started to think about the different ways to use agentic programming, and ended up defining some personas. A human person is not strictly tied to a single persona, they might be Persona A while doing one task, and Persona B while doing something else.

Claims about AI-assisted programming tend to cluster around two extremes: huge productivity gains or cliff-dive quality regressions. The reality, unsurprisingly, is more nuanced. Different studies measure different outcomes (speed, correctness, security, maintainability), and many are conducted under constrained conditions that don’t fully resemble any ‘real’ software development. This section references a number of studies below, which are best to read together rather than in isolation.

Vibe coder
#

A term popularized by Andrej Karpathy, vibe coding is the practice of managing intent and behavior rather than syntax, often deferring implementation entirely to the AI. Your primary input is prompts, and the code is simply a means to an end.

In its pure form, this persona only cares about getting from point A to B, and the LLM is solely responsible for maintaining the code. The output is disposable.

Importantly, this persona isn’t limited to inexperienced users. I’ve seen highly capable engineers intentionally choose this mode to explore an idea quickly, validate a product hypothesis, or prototype a workflow without investing in long-term maintainability. In those cases, the key distinction is intentional disposability.

Problems arise when code produced in this mode quietly graduates into production systems without anyone re-establishing ownership or understanding.

This approach can be entirely valid for hack days, demos, prototypes, or throwaway internal tool, so long as the output is treated accordingly.

There’s been talk that newer models, like Claude Opus 4.5, have reached quality levels where “vibe coding” becomes viable for production work. I’ve also read a lot of extrapolation, asserting that this is future of software engineering. Maybe, but the studies linked earlier suggest the fundamental problem (AI doesn’t know when it’s wrong, humans over-trust) won’t be solved by model improvements alone.

That said, every engineer should try this mode for prototyping. It’s valuable practice in problem decomposition, even when the code remains disposable.

Recommended guardrails
#

Agents are well suited to exploratory work: scaffolding services, prototyping APIs, or generating throwaway implementations. Outputs should be treated as disposable and should not be promoted to production without rewrite, thorough review, or both.

Less experienced builders
#

I use the term builder rather than engineer here intentionally. This persona includes junior developers, but also UX designers, PMs, and others with some programming background.

This relationship resembles pair programming, where the AI acts as a senior mentor and the human as the mentee. The AI becomes a force multiplier, enabling output that would otherwise be out of reach or significantly slower to achieve.

After chatting to Chris Sinco (UX lead on AI Tooling in Android Studio), he noted that the benefits go both ways: broader participation helps non-engineers “gain a better understanding of an engineer’s role, system complexity, and ultimately collaborate more effectively.”

Research shows this persona sees the biggest productivity gains when using AI. A study on GitHub Copilot usage found that junior developers not only reported higher productivity, but also accepted more suggestions. That productivity, however, comes with increased risk: other studies show higher bug and security rates without strong review.

Measuring GitHub Copilot’s Impact on Productivity – Communications of the ACM

Code-completion systems offering suggestions to a developer in their integrated development environment (IDE) have become the most frequently used kind of programmer assistance.1 When generating whole snippets of code, they typically use a …

Communications of the ACM

I’m a big fan of this mode, as it enables people to contribute who normally wouldn’t be able to. AI is working as a secret power-up for the builder’s skill set. However, we need to be careful here: AI is not a replacement for real human mentorship. There’s more to software engineering than simply coding, and that is still where humans matter most.

Recommended guardrails
#

For less-experienced developers, AI tools can accelerate learning and unblock progress, but they also risk masking knowledge gaps. Guardrails matter here: mandatory human authored test specs (in partnership with an experienced engineer), strict code review, and explicit reasoning requirements (“explain why this change is correct”) help prevent silent failure modes. Most importantly, ownership must remain with a human who can maintain the code long-term.

Experienced engineers
#

This persona is also a form of pair programming, but the dynamic is different. Here the AI is not a mentor or a student, it’s an implementer. The engineer remains responsible for the architecture, correctness, and long-term maintainability.

The core benefit for me isn’t speed. It’s parallelism.

AI lets me parallelise myself

I can hand an agent a bounded task and then step away to review PRs, write a design doc, or sit in a meeting, until the agent asks a question or pings completion. That parallelism is freeing as it lets me spend more time on the work where humans are still uniquely valuable: reviews, mentorship, system design.

This does require practice as context switching is expensive. It’s easy to feel like you’re “babysitting” a junior engineer in another tab. The trick is treating the agent as an async process: give it a task, disconnect, come back only on interruption.

Counterintuitively, I now code more. Instead of having to battle between meetings or coding, I often do both. The agent handles the mechanical parts: moving code, writing boilerplate tests, etc, while I focus on the interesting bits.

gantt
    title The "Staff+" Parallel Workflow
    dateFormat  HH:mm
    axisFormat %H:%M

    section Sequential (Old Way)
    Plan Task A          :seq1, 09:00, 30m
    Code Task A          :seq2, after seq1, 60m
    Write Tests A        :seq3, after seq2, 30m
    Review PR (Teammate) :seq4, after seq3, 30m

    section Parallel (Human)
    Plan Task A (Prompting) :par1, 09:00, 15m
    Review PR (Teammate)    :par2, after par1, 30m
    Draft Design Doc C      :par3, after par2, 45m
    Review Agent's Work     :par4, after par3, 30m

    section Parallel (Agent)
    Agent Implements Task A :active, agent1, after par1, 60m
    Agent Writes Tests A    :active, agent2, after agent1, 15m

I’ve found it useful to think of agents as asynchronous contractors. They are fast, cheap to spin up, and good at executing clearly specified work, but occasionally wrong in subtle ways. Like any contractor, they need clear acceptance criteria, bounded scope, and skeptical review. The mistake is treating them as either junior engineers (who you mentor) or senior engineers (who you trust). They are neither.

Recommended guardrails
#

For experienced engineers, agents work best when given bounded tasks with clear acceptance criteria and when their output is reviewed with the same skepticism applied to any unfamiliar contributor.

The changing role
#

The personas above describe behaviors, but the broader reality is that the role of a software engineer is changing.

Increasingly, the job emphasizes:

Making high-level architectural decisions
Understanding unfamiliar codebases quickly
Debugging systems you didn’t write
Reasoning about scale and failure modes
Keeping systems secure

With AI covering more of the mechanical implementation.

When you look at that list, it reads a lot like a Staff+ job description. The role isn’t disappearing, it’s shifting further toward the parts that already mattered most.

The disappearing middle
#

We’re already seeing teams and hiring change. This isn’t just because of AI, but because companies are expecting more from fewer people. Those expectations may be economically rational, but they carry real risks.

Burnout is one. The more worrying risk is a breakdown in how we grow engineers.

The traditional path from junior to senior involved many hours debugging your own code, learning why certain patterns emerge, and developing intuition about what breaks.

But if junior engineers are “vibe coding” (treating implementation as a black box) while seniors “parallelize” (by delegating that same implementation work to agents), we’ve essentially automated away the apprenticeship ladder itself.

Consider: a junior who has only ever prompted their way through implementations will struggle to debug production issues in that same code. They lack the muscle memory of having tackled similar problems manually. Meanwhile, the senior engineers who could mentor them are increasingly detached from implementation details, focused on architecture and review. We risk creating teams made up only of the extreme edges: juniors who generate code they can’t maintain, and seniors who understand systems but have lost touch with implementation realities.

The industry hasn’t figured out how to train the next generation of senior engineers in this new paradigm. The challenge for the future isn’t just how do we use AI? It’s how do we grow engineers when an agent does all of the low-level work?

When this actually works (and when it doesn’t)
#

Agentic programming is most effective when the problem space is moderately complex, feedback is cheap, and planning can be separated from execution. It breaks down when requirements are fuzzy and unverifiable, when side effects are expensive, or when it’s used to avoid understanding the system being modified.

A quick checklist I use:

The task is bounded and mechanical (refactor, move, wire-up)
There’s an objective verifier (tests, lint, build, snapshots)
The blast radius is small or reversible
I can write clear acceptance criteria

Agents become less useful when:

The change is security-sensitive. Things like authentication flows, cryptographic implementations, input sanitisation, or SQL query construction.
Deep, cross-module context is required
Concurrency or ordering correctness is subtle
“Correctness” is primarily product judgment

For these situations, either avoid agents and write it yourself, or review the agent‘s code line by line in minute details.

Non-negotiable guardrails:

Always run tests and linters. If none exist, write tests first.
Keep tasks small and iterative.
Review the diff like it came from an external contributor.

If you cannot review the output with confidence, you should not delegate the work.

A useful test: Can you write the test cases before handing the task to the agent? If you can’t specify the acceptance criteria precisely enough to test against, you’re asking the agent to make product decisions for you. That’s when quality is a coin flip.

What comes next
#

The harder question isn’t “how do I use AI effectively?” It’s “what parts of software engineering should not be delegated?”

Teams will need to be intentional about where humans stay in the loop. Not for efficiency, but for learning. That may mean reserving certain types of work for juniors, slowing down where speed would otherwise be tempting, or deliberately designing team processes where understanding is the goal rather than output.

If we use agents to automate the learning process, we won’t just lose the ‘middle’, we’ll break the software engineer pipeline entirely. Use them to ship faster, sure, but don’t let them do the thinking for you.

Trust, but verify. And if you can’t verify, don’t trust.

🙌 Thanks to Chris Sinco, Nacho Lopez & Manuel Vivo for reviewing this blog post before publish

Me#

A balanced view on AI Programming#

AI Programming Personas#

Vibe coder#

Recommended guardrails#

Less experienced builders#

Recommended guardrails#

Experienced engineers#

Recommended guardrails#

The changing role#

The disappearing middle#

When this actually works (and when it doesn’t)#

What comes next#

Me
#

A balanced view on AI Programming
#

AI Programming Personas
#

Vibe coder
#

Recommended guardrails
#

Less experienced builders
#

Recommended guardrails
#

Experienced engineers
#

Recommended guardrails
#

The changing role
#

The disappearing middle
#

When this actually works (and when it doesn’t)
#

What comes next
#