
It started with a meme. You’ve seen it:
130 + 100 × 5 = 630
100 + 130 × 5 = 750
230 × 5 = 1150
A harmless math bait. Until I realized something worse than the equation: AI gets this wrong too—and worse, most users don’t even notice.
The scary part? It’s not a bug in ChatGPT or Bard. It’s in how you phrase the prompt—and how blindly you trust the output.
Why This Isn’t Just a Math Problem
Large language models aren’t math engines. They simulate correct answers based on patterns—not by calculating step-by-step. Ask ChatGPT:
“What is 130 + 100 × 5?”
You might get:
- ✅ 630 (correct via PEMDAS)
- ❌ 1150 (grouped incorrectly)
- 🤷 Something else entirely, depending on how ‘mathy’ the prompt sounds
And if you’re using AI to build calculators, billing logic, or financial forms? That’s where it gets risky.
AI doesn’t understand logic grouping unless you force it to.
Real Example: The Invisible Bug
Here’s a pricing formula most developers or AI tools might generate:
let total = base + discount * tax;
It returns a number. It doesn’t throw an error. But it’s wrong.
The correct formula?
let total = (base + discount) * tax;
The difference?
let base = 1000;
let discount = 200;
let tax = 1.12;
// AI or rushed dev version:
let total = base + discount * tax; // 1000 + (200 * 1.12) = 1224
// Correct logic:
let total = (base + discount) * tax; // (1000 + 200) * 1.12 = 1344
₱120 per transaction. Multiply that by 10,000 customers? That’s ₱1.2M in loss—or accidental theft.
AI Mirrors Confidence, Not Always Accuracy
When you phrase a prompt like:
“Calculate the correct price from a base, discount, and tax.”
…AI fills in a pattern. Not necessarily the right one. It might group it wrong. It might skip subtotals. It might even explain the wrong formula—with confidence.
And that’s the trap.
👉 If you haven’t read it yet, this Prompt Engineering breakdown explains why phrasing isn’t just UX—it’s functional logic.
What Developers and QA Often Miss
Even seasoned devs using Copilot or GPT-based pair programming miss these logic traps. Why?
- AI agrees with bad habits
- Prompts feel correct but don’t test assumptions
- QA focuses on outputs—not how they were calculated
Example mistake:
let price = subtotal + tax * discount; // assumes correct logic
Even if the result “looks good,” it can be logically broken.
How to QA AI Logic Grouping
If you’re feeding AI into business logic, especially anything with pricing, scoring, or weighted calculations—QA needs to step in.
✅ Ask AI to explain logic before accepting answers
✅ Force step-by-step calculations (or request subtotals)
✅ Run grouped vs ungrouped tests manually
✅ Validate using real math engines (not just GPT)
✅ Flag vague prompt responses during testing
This Isn’t Hypothetical—It’s Already Happening
QAs often assume math logic “just works.” That’s the dev’s job, right?
Wrong. AI is being used to generate formulas—and if the logic grouping is wrong? The output is wrong.
This logic breakdown complements a QA case study that dives into how grouping bugs creep into test flows unnoticed.
Final Thought: Trust but Reverse-Calculate
AI isn’t your calculator. It’s a prediction engine. That means if you don’t tell it exactly how to think, you get answers that sound right—but aren’t.
Don’t just test outputs. Test how the answer was formed. Ask for parentheses. Request the breakdown. And when in doubt, run the math yourself.
Because in production, a single ₱120 logic bug—scaled up—can cost you more than just money. It costs trust.