Home / AI Development & Coding / When to Write Code Yourself and When to Let the Model Do It
AI Development & Coding #0319 8 min read 306 views

When to Write Code Yourself and When to Let the Model Do It

The question isn't whether to use AI for coding. It's which parts of the codebase benefit from generation and which require human authorship to stay correct. Here's the framework that works in practice.

share

The AI vs traditional coding framing is wrong because it implies a binary choice that nobody operating in a real codebase actually makes. The practical question isn’t whether to use AI for coding, most engineers are using it already, it’s which parts of the codebase benefit from AI generation and which parts require human authorship to stay maintainable and correct. That’s a design decision, not a philosophical one, and treating it as a design decision is what separates engineers who get real leverage from AI from engineers who accumulate technical debt they can’t explain.

The framework that works in practice is based on three variables: how well-specified the problem is, how isolated the code is from the rest of the system, and how critical the failure modes are. Code that scores high on all three, well-specified, isolated, low-stakes, is a good candidate for AI generation. Code that scores low on any of them requires more human involvement, not because AI can’t generate it, but because the review cost and failure risk make generation a net negative. The vibe coding boundary is essentially this same framework applied at the project level, where the question is not which function to delegate but which entire workstreams are safe to hand off.

Let the Model Write This

Boilerplate and scaffolding are the clearest AI generation use case. Test file setup, configuration files, standard CRUD implementations, type definitions for well-understood data structures, and repetitive utility functions all generate well and don’t require deep understanding to review. The cost of getting these wrong is low and the cost of writing them manually is real time with no learning value. The review pass on boilerplate is a quick sanity check, not a line-by-line audit, and that ratio of generation speed to review cost is where AI coding pays off most clearly.

Transformations on well-defined data structures are another safe delegation. If you know exactly what the input looks like and exactly what the output should look like, AI generation of the transformation logic is reliable. Write the test cases first, which you should be doing anyway, and use AI to generate the implementation that passes them. This is the most defensible AI coding workflow in a QA context specifically: the tests define the contract, the AI fills the implementation, and the test run is the review. It removes the subjectivity from the review process and gives you a binary pass/fail signal that doesn’t depend on reading AI-generated code with full trust.

Documentation and comments for code you’ve already written are an easy win. The model has the code in context and generates accurate documentation more reliably than it generates correct logic. Docstrings, README sections, and inline comments for complex operations are all good AI generation candidates that don’t require the same adversarial review posture as logic generation. The AI coding workflow post covers how these delegation decisions fit into a full development session rather than evaluating them in isolation.

Write This Yourself

Authentication and authorization logic requires human authorship. The failure modes are security vulnerabilities and the testing surface is adversarial rather than functional. AI-generated auth code may look correct and pass unit tests while containing subtle vulnerabilities that only surface under specific attack patterns. This is a documented pattern in AI code generation, not a hypothetical. Write it yourself, have it reviewed, and test it against known attack patterns rather than trusting that passing the happy path means the logic is sound.

Core business logic that encodes domain rules needs human authorship because the requirements exist in human context that the model doesn’t have. A billing calculation, a regulatory compliance check, or a domain-specific validation rule requires understanding the business intent behind the code, not just the technical implementation. AI generation of business logic produces code that satisfies the literal specification while missing the intent, and those bugs are hard to catch in review because the code looks correct when you read it without the full domain context in mind. The people who discover them are usually the users, not the engineers.

Concurrency and state management in critical paths require full human understanding before shipping. Race conditions, deadlocks, and consistency failures under concurrent load are subtle enough that even human-written code gets them wrong. AI-generated concurrent code gets them wrong more reliably and in ways that are harder to detect because the failure mode often requires specific timing conditions that don’t appear in standard test runs. Knowing where AI-generated code fails in production means knowing this category specifically, since concurrent state management failures are one of the most common patterns in AI-generated code that passes tests but breaks under real load.

The pattern across all three categories is the same: human authorship is required wherever the failure mode requires context the model doesn’t have. The QA-specific version of that decision framework, when manual testing makes more sense than automation, follows the same underlying logic: the value of any tool depends entirely on whether the problem is well-specified enough to delegate

Integration code that connects your system to external dependencies also belongs in the human-authorship column. API integrations, database connection handling, and anything that requires understanding both your system’s behavior and the external system’s contract are cases where AI generation produces plausible-looking code that breaks in ways the model had no way to anticipate. The model knows the API documentation. It doesn’t know the undocumented rate limiting behavior, the timeout characteristics under load, or the specific error codes your production environment has encountered. That knowledge lives in the codebase history and in the team’s operational experience, not in the model’s training data.

The Review Standard Changes Based on What Was Generated

How thoroughly you review AI-generated code should scale with how critical the code is, not with how confident the AI seemed when it generated it. Confidence in the output is not a signal of correctness. It’s a property of the generation process that exists regardless of whether the code is right. Boilerplate review is a quick sanity check. Business logic review is the same process as reviewing a junior developer’s PR with full attention on intent, not just syntax. Security-sensitive code review requires adversarial thinking regardless of who or what wrote it, and treating AI-generated security code with less rigor than human-written security code is how vulnerabilities get shipped.

The practical workflow is: generate, run the tests, then read the code with the specific question “what would have to be true for this to fail in production” in mind, not “does this look correct.” Those are different questions and they produce different review quality. The first question produces a reader who is looking for the failure mode. The second produces a reader who is confirming the happy path. The failure modes in AI-generated code are almost never in the happy path. They’re in the edge cases the model didn’t have enough context to reason about, the error handling it skipped because the prompt didn’t specify it, and the assumptions it made about the environment that don’t hold in your specific deployment. The how I work on an AI dev team as the human QA layer post covers what that review process looks like when the entire dev team is using AI generation and QA is the last human check before production.

The Framework Applied to a Real Project

Applying the three-variable framework to a real project produces clearer decisions than trying to evaluate each function in isolation. At the start of a project, map the codebase by risk: identify the sections that are well-specified and isolated, the sections that encode domain logic, and the sections that touch security or concurrency. The first category is the AI generation zone. The second and third categories are the human authorship zone. That mapping takes thirty minutes and prevents the pattern where AI generation creeps into high-risk areas because the engineer is in a flow state and the tool is right there.

Deploying AI-generated code to production without that framework produces the outcome documented repeatedly across the industry: code that works in development, passes the test suite, and fails in production under conditions the test suite didn’t cover. The how to deploy a vibe coding project without it falling apart post covers the production side of that problem specifically. The framework in this post covers the generation decision. The deployment post covers what happens after you’ve generated it and need to ship it to a real environment with real users and real failure consequences.

The engineers who get the most from AI coding tools are not the ones who use them most aggressively. They’re the ones who have the clearest model of where AI generation adds leverage without adding risk, and who treat that boundary as a design constraint rather than a limitation to route around.

Share this
Jaren Cudilla
Jaren Cudilla
// chaos engineer · anti-hype practitioner

A QA Engineer with active retainer work on AI dev teams. He writes automation in Playwright and JavaScript, builds tools on a local AI stack, and has reviewed enough AI-generated code in production to have a framework for when to trust it and when not to.

// stay in the loop
Get EngineeredAI posts in your inbox
Workflow experiments, tool breakdowns, field notes. No hype. Subscribe free.
subscribe →
01 pingback
↳ remoteworkhaven.net
→ visit source
// Leave a Comment

What is When to Write Code Yourself and When to Let the Model Do It?

The AI vs traditional coding framing is wrong because it implies a binary choice that nobody operating in a real codebase actually makes.