Even the Handcrafted-Code Purists Are Using Agents

When the engineers most associated with software craft start teaching agentic workflows, the lazy "tech bros vibe coding" dismissal stops working.

Apr 18, 2026

I have very little patience for the parade of one-shot AI app demos on the internet. Most of them deserve the mockery. A todo app that “shipped itself” in 14 minutes is not an operating model. It is content.

But the wrong debate is still winning. Too many serious engineers are still reacting to agentic coding as if the whole category were just tech-bro vibe coding with better branding.

That posture is getting harder to maintain. Not because a vendor said the future is here. Vendors always say that. Because the people our industry has spent two decades treating as guardians of software craft are no longer standing outside the room.

CleanCoders now features a Clean AI training track about disciplined practices for building reliable AI-powered software agents. In Robert C. Martin’s Agentic Discipline series, the message is not “stop caring about code.” It is that the greatest danger is an undisciplined approach, that testing and refactoring still matter, and that when AI agents are doing the coding, “code cannot be the source.” The next episode goes as far as demonstrating four agents working collaboratively.

Kent Beck is not posting “ship the slop” energy either. His distinction is almost aggressively clear: vibe coding ignores the code and only chases behavior. Augmented coding still cares about code quality, complexity, tests, and coverage. He just “doesn’t type much of that code.”

Martin Fowler is pushing the same shift at the organizational layer. He writes about “supervisory engineering work” and argues teams may need to reorganize around verification rather than writing code. David Heinemeier Hansson moved from telling senior programmers in late 2024 to treat AI like a junior programmer to saying in early 2026 that agents are capable of production-grade contributions in a supervised collaboration model. Mike McQuaid says he has personally reached the point where AI writes 90% of his code, inside a sandboxed, worktree-heavy setup built around security and control.

Notice what these people are not endorsing: unsupervised autonomy, blind trust, or prompt-lottery development. The pattern is the opposite. Stronger tests. Sharper review. Better source documents. More human accountability.

I think that is the real threshold. This is no longer just the hobbyhorse of people who want computers to flatter them. The craft crowd is moving. Slowly, conditionally, and with plenty of caveats. But moving.

Stack Overflow’s 2025 survey captures the awkward middle nicely. AI-tool adoption is already broad, daily use among professional developers is high, and yet trust remains shaky. More developers distrust AI accuracy than trust it, and a majority still either do not use agents at work or stay in simpler autocomplete mode. That is exactly the environment where disciplined teams learn the fastest: the tools matter, the defaults are not settled, and the people who build verification muscle early get compounding returns.

Why the status shift matters

Most engineering adoption is blocked less by tool access than by professional legitimacy. The repo is not the bottleneck. The social permission is.

If OpenAI, Anthropic, or GitHub tells your team that agents are the future, a healthy engineer hears marketing. Fair enough. If the same basic message arrives from people associated with clean code, TDD, refactoring, Rails craftsmanship, or open-source maintainership, it lands differently. Not because those people are infallible. Because they change what feels professionally acceptable.

Uncle Bob matters here because his public brand is not “move fast and see what the model hallucinated.” It is discipline. Readability. Refactoring. The CleanCoders site now puts Clean AI beside Clean Code, and the Agentic Discipline episodes frame agentic development as something that demands more rigor, not less. That is a cultural signal, not a product launch.

Kent Beck matters because TDD has always been more than a testing tactic. It is a worldview about feedback, scope control, and what counts as a good development loop. When Beck says augmented coding still values tidy code, tests, and coverage, he is not watering down that worldview. He is showing how it survives contact with agents.

Fowler matters because he pushes the conversation out of individual workflow bragging and into organizational design. “Supervisory engineering work” is a useful phrase precisely because it is a little annoying. It forces managers and staff engineers to admit that the valuable human work may be shifting from authoring every line to directing, evaluating, and correcting work that machines can now draft.

DHH matters because he is culturally allergic to a lot of enterprise AI theater. He likes coherent tools, strong opinions, and product taste. When he says agents went from “treat this like a junior programmer” to “this can make production-grade contributions under supervision,” that lands with teams that would never take their cues from benchmark maximalists or demo-day optimists.

Mike McQuaid matters because his endorsement is operational rather than mystical. He talks about sandboxes, worktrees, permissions, and security. That is the language of someone who expects real consequences from bad automation. In other words, it is the language serious teams actually need.

The important point is not hero worship. The important point is convergence.

These are different tribes with different taste profiles and different reasons for being skeptical. They are not all saying the same thing. But they are landing in the same neighborhood: agentic coding is real, it is useful, and it should be bounded by discipline, tests, review, and human accountability.

That is enough to end one lazy argument. You can still argue about economics, quality ceilings, legal risk, or when not to use agents. Those are real debates. But the dismissal that this is merely “tech bros vibe coding” no longer matches the public behavior of the craft establishment.

Craft did not die. It moved.

The useful question is not whether a human typed every line. It is where quality lives now.

For a long time, elite engineering identity was tied to manual authorship. You proved seriousness by holding the keyboard, knowing the APIs cold, and treating every line as a handcrafted object. That identity made sense when syntax production, library recall, and repo spelunking were expensive human work.

They are getting cheaper. Judgment is not.

When Uncle Bob says “code cannot be the source,” he is making a bigger point than most people noticed. If code is no longer the primary artifact humans produce first, then the scarce human artifact becomes the thing above the code: the source document, the acceptance criteria, the domain language, the tests, the architecture, the non-goals, the failure cases. Fowler’s verification framing lands in the same place from another angle. Beck lands there through TDD. DHH lands there through supervised collaboration. Different accents. Same migration.

Preference is not performance.

A lot of senior engineers still confuse “I prefer writing this myself” with “this is the highest-value use of my time.” Those are not the same sentence. Sometimes hand-writing the code is still correct. Sometimes it is the expensive way to produce syntax that a supervised agent could have drafted while you spent your scarce attention on defining the trade-offs and protecting the boundaries.

This sounds obvious, but teams miss it all the time. They treat agentic coding as a referendum on whether humans should still understand code. Of course they should. That is not the live issue. The live issue is whether the best engineer on the team should spend Tuesday wiring another pagination endpoint, or spend Tuesday defining the behavioral contract so the endpoint, its tests, and its edge-case handling can be generated, checked, and improved with less waste.

The strongest public advocates are not abandoning craft. They are relocating it.

Craft used to sit visibly in the syntax. Now more of it sits in task definition, decomposition, verification, evaluation, and refusal. Refusal matters more than a lot of AI enthusiasts admit. Someone still has to decide which request is poorly framed, which generated solution is elegant nonsense, which “working” implementation will rot the codebase six weeks from now, and which shortcut is unacceptable even if it passes today’s tests.

That is why the new prestige skill is not “prompting.” I dislike that word because it sounds like party trick optimization. The real skill is designing loops: source documents, instructions, tests, permissions, review rituals, fallback paths, and escalation rules. You are not just asking for code. You are building a system that can safely produce and challenge code.

In that frame, hand-crafted code is not the point. Hand-crafted judgment is.

What serious agentic coding looks like on Monday morning

Let’s make this concrete, because abstract arguments are cheap.

Imagine your team needs to add exportable audit logs to a mature B2B application. There are auth rules, tenancy boundaries, rate limits, retention constraints, CSV formatting edge cases, and two annoying legacy models nobody wants to touch.

The unserious version of AI use is obvious. Throw a vague prompt at a model, accept a confident diff, and hope CI catches the rest. That is how you manufacture theater and debt at the same time.

The serious version looks different.

A human writes the source document first. One page is enough. What the feature must do. What it must never do. The permissions model. The shape of the response. Failure cases. Migration considerations. Logging requirements. Performance guardrails. What counts as done. That document is not ceremony. It is the artifact that makes the work delegable.

Then a first agent maps the repo and identifies the likely blast radius: touched modules, tests that should change, docs that will drift, commands that should run, places where naming or domain assumptions look fragile. A second agent drafts or extends the acceptance tests. A human reviews both before implementation starts, not because the human likes bureaucracy, but because catching a bad premise before code exists is still the cheapest move in software.

Only then do you let the implementation loop run.

That loop should live in a sandbox or separate worktree. It should have explicit permissions. It should run tests. It should explain its changes. It should flag uncertainty instead of bluffing. It should return a diff that a human can interrogate.

Then the human does the part that still matters most. Does the change fit the architecture? Does it respect the product constraints? Did the model solve the wrong problem elegantly? Did it bury a future maintenance tax under a passing test? Did it miss the social meaning of the codebase, the local conventions that never quite make it into docs but absolutely shape what belongs?

Bad parts go back for revision. Good parts move forward. Responsibility never becomes ambiguous.

That is not vibe coding. It is delegated implementation inside a verification loop.

It also changes what “senior” means. The sharp engineer in that workflow is not the person who can manually type the fastest controller. It is the person who writes the clearest source document, decomposes the task cleanly, anticipates the edge cases, chooses the right verification, and notices when the agent is confidently wrong.

This is why I think the anti-AI line “real engineers write their own code” is aging badly. Real engineers reduce uncertainty. Sometimes that still means writing the code by hand. More often now, it means designing the loop that produces and validates the code.

And no, this does not magically remove the need for code literacy. Quite the opposite. Supervision without technical depth becomes rubber-stamping. If anything, agentic coding punishes shallow engineers more harshly, because it gives them far more ways to approve something they do not actually understand.

The useful question is not “Did the AI write code?” It is “Did the team compress the boring 70 percent without blurring ownership for the dangerous 30 percent?” That is a much better test of professionalism.

If you want a starting point, start boring. Test scaffolding. Dependency updates. Refactors protected by strong tests. Documentation drift. Repetitive internal tooling. Migration spikes. Review prep. The first draft of tedious but bounded implementation. Do not start by asking an agent to wander your production backlog unsupervised like a caffeinated intern with root access.

Teams get into trouble when they confuse model capability with organizational readiness. The model may be good enough. Your permissions, source artifacts, review habits, and fallback rules may not be. Those are different questions.

The real risk is waiting

A lot of teams are treating delay as prudence. They see noisy demos, dubious benchmark discourse, and a thousand workflow hot takes, then conclude that the mature response is to stand still for another year.

I think that is backwards.

The first durable advantage here is not model access. Everyone has model access. The advantage is learning how to define tasks, constrain permissions, write reusable instructions, review AI-produced diffs, and build the social norms that keep accountability human. Those are operating muscles. They do not suddenly appear on the day you decide the tools are finally respectable.

And this market is in exactly the kind of messy middle where operational learning compounds. Broad AI-tool usage is already mainstream. Agent usage is still uneven. Trust is still incomplete. Most teams are not yet good at this. That means there is still time to learn without being hopelessly behind, and still enough noise in the system that disciplined adopters can pull away from the clowns.

The problem is usually not tool access. It is managerial and cultural permission.

Senior engineers do not want to look unserious. Staff engineers do not want to be associated with toy demos. CTOs do not want to sponsor a clown show and then spend six months cleaning up the aftermath. All reasonable concerns.

That is exactly why the movement of people like Uncle Bob, Beck, Fowler, DHH, and McQuaid matters. They do not remove the need for judgment. They remove the last easy excuse that this category is beneath serious engineers.

Most organizational change in software does not fail because a capability is unavailable. It fails because the behavior is not yet socially legible. Once high-status craft figures say, in different ways, “this is real, but only with discipline,” experimentation stops feeling reckless and starts feeling professional.

That is the call to action here. Not “trust the agent.” Not “let the model cook.” Not “replace engineers with vibes and PRs.”

Raise the bar for source. Raise the bar for tests. Raise the bar for review. Then give agents the work that fits inside that bar.

Start with bounded tasks. Keep the permissions tight. Demand explanations. Measure rework, review burden, and accepted diffs, not just raw output volume. Teach people that catching a bad premise early is a win, not an embarrassment. Reward clear task framing as much as clever implementation.

Most of all, stop talking about hand-crafted code as though manual typing were the thing worth preserving. What is worth preserving is taste, responsibility, clarity, and judgment. If agents can take the rote work while humans hold the line on those things, that is not the death of craft. That is craft refusing to waste its time.

The question is not whether AI will write code in your organization. It already is, or it soon will be. The question is whether your engineers will learn to supervise that work better than their peers - or whether they will keep mistaking a preference for hand-typing syntax with a principle.

Are you defending craft, or are you just defending your preferred way of producing text?

* * *

Notes and References

1. CleanCoders, “Featured Series” page; “Agentic Discipline” Episodes 1-3; public site accessed April 2026.

2. Kent Beck, “Augmented Coding: Beyond the Vibes,” Software Design: Tidy First?, June 25, 2025.

3. Martin Fowler, “Fragments: March 16,” March 16, 2026.

4. Martin Fowler, “Fragments: April 2,” April 2, 2026.

5. David Heinemeier Hansson, “The premise trap,” December 16, 2024.

6. David Heinemeier Hansson, “Promoting AI agents,” January 7, 2026.

7. Mike McQuaid, “Sandboxes and Worktrees: My secure Agentic AI Setup in 2026,” April 14, 2026.

8. Stack Overflow, “AI | 2025 Stack Overflow Developer Survey,” 2025.

Discussion about this post

Ready for more?