When to Write Code Yourself and When to Let the Model Do It

The AI vs traditional coding framing is wrong because it implies a binary choice that nobody operating in a real codebase actually makes. The practical question isn’t whether to use AI for coding, most engineers are using it already it’s which parts of the codebase benefit from AI generation and which parts require human authorship to stay maintainable and correct. That’s a design decision, not a philosophical one.

The framework that works in practice is based on three variables: how well-specified the problem is, how isolated the code is from the rest of the system, and how critical the failure modes are. Code that scores high on all three, well-specified, isolated, low-stakes is a good candidate for AI generation. Code that scores low on any of them requires more human involvement. The vibe coding boundary is essentially this same framework applied at the project level.

Let the Model Write This

Boilerplate and scaffolding are the clearest AI generation use case. Test file setup, configuration files, standard CRUD implementations, type definitions for well-understood data structures, and repetitive utility functions all generate well and don’t require deep understanding to review. The cost of getting these wrong is low and the cost of writing them manually is real time with no learning value.

Transformations on well-defined data structures are another safe delegation. If you know exactly what the input looks like and exactly what the output should look like, AI generation of the transformation logic is reliable. Write the test cases first which you should be doing anyway and use AI to generate the implementation that passes them. This is AI-assisted automation at its most effective: clear inputs, clear outputs, testable behavior.

Documentation and comments for code you’ve already written are an easy win. The model has the code in context and generates accurate documentation more reliably than it generates correct logic. Docstrings, README sections, and inline comments for complex operations are all good AI generation candidates.

Write This Yourself

Authentication and authorization logic requires human authorship. The failure modes are security vulnerabilities and the testing surface is adversarial rather than functional. AI-generated auth code may look correct and pass unit tests while containing subtle vulnerabilities that only surface under specific attack patterns. This is not a hypothetical — it’s a documented pattern in AI code generation. Write it yourself, have it reviewed, and test it against known attack patterns.

Core business logic that encodes domain rules needs human authorship because the requirements exist in human context that the model doesn’t have. A billing calculation, a regulatory compliance check, or a domain-specific validation rule requires understanding the business intent behind the code, not just the technical implementation. AI generation of business logic produces code that satisfies the literal specification while missing the intent, and those bugs are hard to catch in review.

Concurrency and state management in critical paths require full human understanding before shipping. Race conditions, deadlocks, and consistency failures under concurrent load are subtle enough that even human-written code gets them wrong. AI-generated concurrent code gets them wrong more reliably and in ways that are harder to detect. Knowing where AI-generated code fails in production means knowing this category specifically.

The Review Standard Changes

How thoroughly you review AI-generated code should scale with how critical the code is, not with how confident the AI seemed when it generated it. Boilerplate review is a quick sanity check. Business logic review is the same process as reviewing a junior developer’s PR. Security-sensitive code review requires adversarial thinking regardless of who or what wrote it.

The practical workflow: generate, run the tests, read the code with the question “what would have to be true for this to fail?” in mind, and ship the parts that pass that filter. The parts that don’t pass it need rewriting rather than iterative prompting. More prompting on fundamentally wrong code produces more wrong code.