I used GPT-5 to write this post. It took four days.
Same task under GPT-4? Two hours.
And no — I’m not expecting instant magic. I proofread everything. A clean draft always takes me two hours with AI assist. But GPT-5 didn’t assist. It resisted.
This isn’t a reaction post. It’s a pipeline teardown. If OpenAI’s serious about production, this needs to read like a bug ticket.

If You Ask 1+1
You ask 1+1, you want 2.
GPT-4 gives you 2. End of job.
GPT-5 gives you 357, a lecture on math philosophy, and a polite insinuation that you misunderstood your own instructions.
That’s not intelligence. That’s drift.
GPT-4: ADHD Energy, Still Reliable
GPT-4 was impulsive. It guessed a lot. But it was coachable.
I ran it through structured blog frameworks, schema-based formatting, and detailed writing constraints. No memory. No folders. It still followed instructions.
When it veered off, a correction pulled it back fast. It respected the system.
That’s why GPT-4 shipped drafts faster and required less cleanup. Even when it made noise, it stayed obedient.
GPT-5: ASD Rigidity, Slower, and Argumentative
GPT-5 came in with memory, project folders, and persistent context. Great on paper. Broken in practice.
It drifted. It ignored custom instructions. It treated folders like suggestions.
When you correct it, it doesn’t adapt — it explains itself.
Working with GPT-5 feels like managing someone who insists they’re being helpful — while actively undoing what you asked for.
The Behavioral Analogy
GPT-4 behaves like a kid with ADHD. It’s quick, scattered, assumes too much — but correctable.
GPT-5 behaves like a kid with ASD. Slower, hyper-focused on the wrong thing, and rigid when redirected.
That’s not a joke. I raise kids with both. This is lived pattern recognition, not an insult.
And it nails the difference between these models in actual structured use.
Memory Isn’t Broken. Obedience Is.
GPT-5 has access to everything: memory, folders, rules, files.
It still breaks instructions.
It doesn’t forget. It chooses to reinterpret.
It rewrites tasks midstream and turns hard rules into “creative inputs.”
GPT-4 had fewer tools, but more discipline. GPT-5 has more tools and no accountability.
The Subscription Gap
When your Plus sub expires, GPT-4 slows down. Shorter chat history, longer waits — but still usable.
GPT-5? It gaslights.
It starts misreading your prompts.
It rewrites questions.
It argues when corrected.
It acts like you’re the issue.
That’s not throttling. That’s model sabotage.
Why I Ran GPT-5 Through the Gauntlet
This isn’t a one-off complaint. I’m a QA. I don’t just use tools — I test their limits.
Because when I need a feature under pressure, I can’t afford a surprise failure.
I started using GPT-5 in August 2025. And I gave it the same system-level gauntlet every tool in my stack faces:
- Strict formatting
- Project folders
- Writing schemas
- Prompt files
- Constraint-first workflows
Yes, I used memory.
Yes, I fed project folders and rules.
No, I didn’t use GPT-4o’s DeepResearch or Turbo or Mini. I kept it clean.
And GPT-5 still failed.
Over a month of structured testing, I had to fall back to base GPT-4 for clarity, control, and consistent execution.
More features didn’t make it better. They just gave it more ways to ignore the task.
What OpenAI Needs to Fix
Want GPT-5 to survive production? Then fix what actually breaks it:
1. Obey Constraints
Folders, rules, and instructions must be treated as law — not input.
2. Snap Back When Corrected
No lectures. No explanations. If a correction is given, lock it and move.
3. Memory Filters
Let users decide what sticks and what burns. Context should be clean, not cluttered.
4. Subscription Parity
Slow it down if needed — but don’t make it dumber. That’s manipulation, not monetization.
If you want to keep the ads minimal (or gone entirely), consider supporting the site. That helps cover hosting, coffee, and eventually — a gaming PC that doesn’t thermal throttle every time I stress-test a multi-model pipeline.
☕ Support EngineeredAI on Buy Me a Coffee
Final Verdict
This post should’ve taken two hours. It took four days.
Because GPT-5 didn’t streamline the task — it obstructed it.
GPT-4 still delivers more value. Not because it’s newer. Because it listens.
GPT-5? Not ready.
If OpenAI wants creators, operators, and teams to rely on GPT-5 — then make it act like a professional.
Until then, GPT-4’s still doing the job better.



Pingback: Claude 4.5 vs Claude 4: The Content Creation Gauntlet
Pingback: Extend Claude’s Context