GPT-5 vs GPT-4: A Bug Report Disguised as a Review

I used GPT-5 to write this post. It took four days.
Same task under GPT-4? Two hours.

And no — I’m not expecting instant magic. I proofread everything. A clean draft always takes me two hours with AI assist. But GPT-5 didn’t assist. It resisted.

This isn’t a reaction post. It’s a pipeline teardown. If OpenAI’s serious about production, this needs to read like a bug ticket.


If You Ask 1+1

You ask 1+1, you want 2.
GPT-4 gives you 2. End of job.
GPT-5 gives you 357, a lecture on math philosophy, and a polite insinuation that you misunderstood your own instructions.

That’s not intelligence. That’s drift.


GPT-4: ADHD Energy, Still Reliable

GPT-4 was impulsive. It guessed a lot. But it was coachable.
I ran it through structured blog frameworks, schema-based formatting, and detailed writing constraints. No memory. No folders. It still followed instructions.

When it veered off, a correction pulled it back fast. It respected the system.
That’s why GPT-4 shipped drafts faster and required less cleanup. Even when it made noise, it stayed obedient.


GPT-5: ASD Rigidity, Slower, and Argumentative

GPT-5 came in with memory, project folders, and persistent context. Great on paper. Broken in practice.

It drifted. It ignored custom instructions. It treated folders like suggestions.
When you correct it, it doesn’t adapt — it explains itself.

Working with GPT-5 feels like managing someone who insists they’re being helpful — while actively undoing what you asked for.


The Behavioral Analogy

GPT-4 behaves like a kid with ADHD. It’s quick, scattered, assumes too much — but correctable.
GPT-5 behaves like a kid with ASD. Slower, hyper-focused on the wrong thing, and rigid when redirected.

That’s not a joke. I raise kids with both. This is lived pattern recognition, not an insult.
And it nails the difference between these models in actual structured use.


Memory Isn’t Broken. Obedience Is.

GPT-5 has access to everything: memory, folders, rules, files.
It still breaks instructions.

It doesn’t forget. It chooses to reinterpret.
It rewrites tasks midstream and turns hard rules into “creative inputs.”

GPT-4 had fewer tools, but more discipline. GPT-5 has more tools and no accountability.


The Subscription Gap

When your Plus sub expires, GPT-4 slows down. Shorter chat history, longer waits — but still usable.
GPT-5? It gaslights.

It starts misreading your prompts.
It rewrites questions.
It argues when corrected.
It acts like you’re the issue.

That’s not throttling. That’s model sabotage.


Why I Ran GPT-5 Through the Gauntlet

This isn’t a one-off complaint. I’m a QA. I don’t just use tools — I test their limits.
Because when I need a feature under pressure, I can’t afford a surprise failure.

I started using GPT-5 in August 2025. And I gave it the same system-level gauntlet every tool in my stack faces:

  • Strict formatting
  • Project folders
  • Writing schemas
  • Prompt files
  • Constraint-first workflows

Yes, I used memory.
Yes, I fed project folders and rules.
No, I didn’t use GPT-4o’s DeepResearch or Turbo or Mini. I kept it clean.

And GPT-5 still failed.

Over a month of structured testing, I had to fall back to base GPT-4 for clarity, control, and consistent execution.

More features didn’t make it better. They just gave it more ways to ignore the task.


What OpenAI Needs to Fix

Want GPT-5 to survive production? Then fix what actually breaks it:

1. Obey Constraints

Folders, rules, and instructions must be treated as law — not input.

2. Snap Back When Corrected

No lectures. No explanations. If a correction is given, lock it and move.

3. Memory Filters

Let users decide what sticks and what burns. Context should be clean, not cluttered.

4. Subscription Parity

Slow it down if needed — but don’t make it dumber. That’s manipulation, not monetization.


Disclaimer: This post may include ads or affiliate links. Nothing here is sponsored. If you see an ad, it’s because the algorithm thinks you’re the mark — not because I took a payout.

If you want to keep the ads minimal (or gone entirely), consider supporting the site. That helps cover hosting, coffee, and eventually — a gaming PC that doesn’t thermal throttle every time I stress-test a multi-model pipeline.

Support EngineeredAI on Buy Me a Coffee

Final Verdict

This post should’ve taken two hours. It took four days.
Because GPT-5 didn’t streamline the task — it obstructed it.

GPT-4 still delivers more value. Not because it’s newer. Because it listens.
GPT-5? Not ready.

If OpenAI wants creators, operators, and teams to rely on GPT-5 — then make it act like a professional.

Until then, GPT-4’s still doing the job better.

Jaren Cudilla – Engineered AI
Jaren Cudilla
Chaos Engineer of EngineeredAI.net — runs real-world gauntlets on LLMs and documents every failure they try to hide.

He doesn’t review AI tools — he breaks them. Writes teardown-level breakdowns of prompt engines, schema workflows, and automation stacks. If it drifts, he catches it. If it argues, he logs it. If it survives, it’s earned.
🔗 About • 💼 LinkedIn • ☕ Support the Work

Leave a Comment

2 thoughts on “GPT-5 vs GPT-4: A Bug Report Disguised as a Review”

  1. Pingback: Claude 4.5 vs Claude 4: The Content Creation Gauntlet

  2. Pingback: Extend Claude’s Context