Home / AI Development & Coding / Vibe Coding With Responsibility: AI Built This App. QA Decides If It’s Done.
AI Development & Coding #0601 12 min read 5 views

Vibe Coding With Responsibility: AI Built This App. QA Decides If It’s Done.

Vibe coding gets the app built. The deployment pipeline gets it live. Neither of those things means it's done. Here's the full pipeline from GitHub org to Vercel, the real friction points, and the skill file that drives every future Smallware Co. build.

share

PowerPlanner is a 16-file PWA that calculates home renewable energy setups for solar panels, wind turbines, micro-hydro, rainwater collection, battery banks, financial projections, the works. It is live at powerplanner.vercel.app. It has a GitHub repo under the Smallware Co. org. It deploys automatically on every push via Vercel.

It is also still being tested.

That gap between deployed and done is the whole point of this post.

Vibe coding gets framed two ways. Either it is the future of software development where anyone can ship apps without knowing how to code, or it is a slop generator that produces confident-looking garbage that breaks the moment a real user touches it. Both framings are wrong. Vibe coding is a development strategy that works when the person running it understands what the AI is generating and owns the quality gate at the end. The coding knowledge requirement did not disappear. It moved from writing to reviewing. And the review that matters is not AI-generated tests against AI-generated code. It is human QA with actual context about what the app is supposed to do.

This post covers how PowerPlanner got built, how the deployment pipeline works, and why the skill file that drove the build is now available for anyone running a similar setup. If you want to skip to the skill file, it is at the bottom. If you want to understand why the pipeline is built the way it is, read the whole thing.

What Vibe Coding Actually Means When You Know What You’re Doing

The term comes from a specific kind of AI-assisted development where you describe what you want and the model generates the code. The criticism is fair when the person doing it treats the output as finished work. The output is never finished work. It is a first draft from a system that has no stake in whether the app actually functions correctly for real users in real conditions.

What changes when the operator knows code is the review layer. Reading generated code and understanding what it does, where it might fail, and what the spec said it should do requires the same knowledge as writing the code. The generation is faster. The responsibility is identical. That is the distinction that makes vibe coding a legitimate strategy rather than a shortcut.

The other constraint that matters is scope. Local LLMs in particular perform significantly better on tighter, well-defined builds. A 16-file static PWA is achievable in one session. A full-stack application with authentication, a database, and external API integrations is not. Scoping correctly before starting is the difference between a session that produces a working app and a session that produces an impressive-looking mess that fails on closer inspection. As I covered in when to write code yourself vs let the model do it, the delegation decision is about how well-specified the problem is, how isolated the code is, and how critical the failure modes are. PowerPlanner scored well on all three: clear spec, isolated static files, low-stakes failure modes. That is why it was a good vibe coding candidate.

The Build: 16 Files, One Session, One Spec

PowerPlanner started as a conversation about a real problem. Meralco bills, three aircons, multiple desktops, a 24/7 machine running Ollama, a terrace that might be windy enough for VAWTs, and a cut drainpipe that could drive a micro-hydro turbine. The previous solar calculators I found online were locked to assumptions. Sliders pre-filled with averages. No control over real component specs or real prices. The decision to build rather than use an existing tool was a function of the existing tools not doing what the problem actually required.

The spec that came out of that problem definition covered: a four-tab PWA (Build, Results, Saves, Settings), six energy source types with their own input forms, pure math calculators with no DOM dependency, a financial projection engine with Meralco rate inflation, a named build save system using localStorage, a service worker for offline support, and a mobile-first layout that works on desktop without stretching. Sixteen files across four directories, each with a single responsibility. The architecture was designed before the first line of code was generated.

That architecture decision is the human contribution that makes vibe coding work. The model generates the implementation. The human designs the system. When those roles are reversed when the human describes a vague idea and lets the model decide the architecture, the output is harder to review, harder to extend, and harder to debug when something is wrong.

The Deployment Pipeline

The pipeline for a static PWA has no moving parts once it is set up. GitHub org for the repo, Vercel for hosting, automatic deploy on every push to main. The whole thing costs nothing on the free tier as long as the repo is public.

The friction points that nobody documents honestly: Vercel’s free Hobby tier requires a public repo when deploying from a GitHub org. Private org repos require Pro. The env variable section in the Vercel deploy UI has a bug where an open empty row blocks the deploy button and close it before hitting deploy. GitHub Desktop requires an explicit OAuth grant to see org repos, found under your personal account settings at github.com/settings/applications, not the org settings. None of these are difficult problems but all of them cost time when you hit them without warning.

The config for a pure static app on Vercel is nothing. Framework preset set to Other. Build command blank. Output directory blank. Install command blank. Vercel detects that there is no build step and serves the files directly. The service worker handles offline caching after the first load. The PWA manifest enables Add to Home Screen on mobile. The whole infrastructure is zero maintenance once deployed.

For the Smallware Co. pipeline specifically, every future app follows the same steps. One repo per app under the org, README and .gitignore before the first push, Vercel project pointed at the repo root, done. Alfred’s Vibe Coder agent writes to this pipeline by default. The deployment step is not a decision point. It is just the next step in the process.

Why the App Is Still Being Tested

PowerPlanner is functional. The math runs. The component forms accept real inputs. The results tab produces bill reduction percentages, payback periods, and year-by-year financial projections. The saves system works. The service worker registers. The PWA installs on mobile.

It is still being tested because functional and correct are not the same thing.

The energy math uses established formulas but the implementation needs verification against real-world scenarios. The wind output calculation uses the cube law correctly but the capacity factor approximation needs validation against actual VAWT data. The micro-hydro head efficiency formula is a reasonable approximation but needs checking against real pipe turbine performance curves. The financial projection compounds Meralco rate increases annually but the default rate increase percentage needs real historical data behind it rather than a reasonable assumption.

None of these are obvious bugs. They are the kind of issues that only surface when someone who understands the domain runs the tool against real inputs and checks the outputs against known correct answers. That is human QA. It requires domain knowledge, not just functional testing. An automated test suite can verify that the app does not crash when you add a solar panel. It cannot verify that the kWh estimate for that panel is correct.

This is why the AI-generated code passes tests, breaks production pattern exists. Tests verify behavior. They do not verify correctness. Human QA with context closes that gap.

The Skill File

The methodology that drove the PowerPlanner build is now packaged as a loadable skill file for Alfred and any compatible local LLM setup. It covers the full vibe coding pipeline: scoping format, prompt-as-spec structure, per-file and full-build review checklists, deployment pipeline steps, and the QA handoff trigger.

The skill file is designed for Ollama-compatible models but works with any instruction-following model including Claude. The lite version covers the essential operating rules in under one page. The full version covers every phase with checklists and format templates.

// Full Version Alfred: Vibe Coder Skill — Full Scope · Spec · Review · Deployment · QA handoff
# Alfred: Vibe Coder Skill — Full
Version: 1.0
Runtime: Local LLM (Ollama-compatible) · Claude · Any instruction-following model
Scope: AI-assisted app development from spec to deployed static PWA
Gate: Human QA required before any build is considered done

---

## Role

You are a vibe coding agent operating inside the Alfred system.
Your job is to generate clean, working, well-structured code based on a spec provided by the operator.
You do not make architectural decisions without operator input.
You do not add features that were not in the spec.
You do not assume the build is done. QA closes the loop, not you.
When uncertain, ask one clear question before proceeding.

---

## Phase 1: Scope the Build

Before writing any code, the operator defines the build scope.
Local LLMs perform better on smaller, tighter scopes.

Scope format:
BUILD: [app name]
STACK: [technology]
OUTPUTS: [file list]
DOES: [one sentence]
DOES NOT: [explicit exclusions]
DONE WHEN: [completion criteria]

---

## Phase 2: Prompt as Spec

The prompt is a specification document, not a casual request.
Vague prompts produce vague code.

Spec structure:
Purpose · Stack · File Structure · Component Behavior · Data Flow · Constraints · Output Format

Local LLM rules:
- Keep context under 4000 tokens per session
- One file per prompt for complex files
- Provide existing file contents when modifying
- State the stack explicitly every session
- Never request the full app in one prompt unless under 200 lines total

---

## Phase 3: Review the Output

Per file:
- Matches spec
- No hardcoded values that should be variables
- Error handlers present on async operations
- Clean naming
- No dead code or unused variables
- Correct imports, no missing dependencies

Full build:
- All files connect correctly
- State management is consistent across files
- Edge cases and null values handled
- No logic duplicated across files that should be shared

What to do with bad output:
Do not iteratively prompt bad code into shape.
If the output is fundamentally wrong, rewrite the spec and regenerate.
More prompting on a broken foundation produces more broken code.

---

## Phase 4: Deployment Pipeline

Static PWA pipeline:
Local folder → GitHub repo → Vercel project → Custom domain (when ready)

Repo setup:
- One repo per app, or monorepo with one Vercel project per subfolder
- Public repo for Vercel free tier org deployments
- README.md and .gitignore before first push

Vercel config for static apps:
Framework preset: Other
Build command: [blank]
Output directory: [blank]
Install command: [blank]

Every push to main triggers auto-deploy.
Document the pipeline once. Every future app uses the same steps.

---

## Phase 5: QA Handoff

QA HANDOFF
App: [name]
URL: [live URL]
Repo: [GitHub URL]
Build spec: [link or inline]
Known issues: [list anything noticed during review]
Out of scope: [what was explicitly not built]
Test focus: [what needs the most attention]

Human QA is not optional.
The AI that built the code has the same blind spots as an AI asked to test it.

---

## Constraints

- No direct production push without human review
- No AI-generated tests as substitute for human QA
- No features outside the spec without operator approval
- No done claim without QA sign-off
- One question before proceeding when scope is ambiguous

---

## Compatible With

Ollama (Mistral, LLaMA 3, Qwen, Phi) · Claude (Sonnet, Haiku) · Any instruction-following model

Version 1.0 — Part of the Alfred system · EngineeredAI.net · Smallware Co.
Copy the text above or download the file. Download Full
// Lite Version Alfred: Vibe Coder Skill — Lite Role · Scope · Review · Deployment · Constraints
# Alfred: Vibe Coder Skill — Lite
Version: 1.0
Runtime: Local LLM (Ollama-compatible) · Claude · Any instruction-following model

---

## Role

You are a vibe coding agent. Generate clean, working code from a spec.
Do not add features outside the spec.
Do not claim the build is done. QA closes the loop.
Ask one question when scope is unclear.

---

## Scope Format

BUILD: [app name]
STACK: [technology]
OUTPUTS: [file list]
DOES: [one sentence]
DOES NOT: [exclusions]
DONE WHEN: [completion criteria]

---

## Spec Structure

Purpose · Stack · File Structure · Component Behavior · Data Flow · Constraints · Output Format

---

## Local LLM Rules

- Keep context under 4000 tokens per session
- One file per prompt for complex files
- Provide existing file contents when modifying
- State the stack explicitly every session
- Never request the full app in one prompt unless under 200 lines

---

## Review Checklist

Per file: matches spec · no hardcoded values · error handlers present · clean naming · no dead code · correct imports

Full build: files connect · state consistent · edge cases handled · no duplicated logic

---

## Deployment (Static PWA)

Vercel config: Framework = Other · Build command = blank · Output = blank
Repo: public for free tier org · README + .gitignore before first push
Pipeline: push to main → auto-deploy

---

## QA Handoff

QA HANDOFF
App · URL · Repo · Known issues · Out of scope · Test focus

Human QA is not optional.
The AI that built the code has the same blind spots as an AI asked to test it.

---

## Constraints

No direct production push · No AI-generated tests as QA substitute
No out-of-spec features · No done claim without QA sign-off

---

Version 1.0 — Alfred system · EngineeredAI.net · Smallware Co.
Copy the text above or download the file. Download Lite

What Comes Next

PowerPlanner is App 1 from Smallware Co. The build is done. The testing is ongoing. When QA closes the loop the app gets its first update with validated math, corrected edge cases, whatever the testing surfaces. That is the cycle.

The QAJ post that follows this one covers the human QA side of an AI-generated codebase. Not how to write Cypress tests. How to think about testing code you did not write, when the code was generated by a model that had no way to know what correct actually means for your specific domain. The systematic debugging skill that ships with that post is the third piece of this trilogy. HobbyEngineered covered the idea that generated the tool. This post covered how the tool got built and shipped. QAJourney closes the loop on whether it actually works.

That is the full workflow. Idea to build to ship to test. AI accelerates the middle part. Humans own the beginning and the end.

Share this
Jaren Cudilla
Jaren Cudilla
// chaos engineer · anti-hype practitioner

Deployed PowerPlanner as App 1 of Smallware Co. using a vibe coding pipeline, hit every friction point Vercel's free tier has, documented all of it. Runs EngineeredAI.net where AI workflows get documented honestly, including the parts that aren't finished yet.

// stay in the loop
Get EngineeredAI posts in your inbox
Workflow experiments, tool breakdowns, field notes. No hype. Subscribe free.
subscribe →
// Leave a Comment

What is Vibe Coding With Responsibility: AI Built This App. QA Decides If It’s Done.?

PowerPlanner is a 16-file PWA that calculates home renewable energy setups for solar panels, wind turbines, micro-hydro, rainwater collection, battery banks, financial projections, the works.