The "Is AI Wrong?" Checklist
Spot when AI is confidently bluffing - weak zones, behavioral red flags, and a 60-second verify routine.
Outcome: catch AI’s confident mistakes before you act on them — with a 60-second routine you’ll actually use.
Who it’s for: anyone about to use, send, or believe something an AI told them.
The big idea — confidence is not correctness. AI is built to sound sure even when it’s guessing. OpenAI says it plainly: models “guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty” — like a student bluffing on an exam. Your job isn’t to distrust everything. It’s to know which answers to check. These are the tells.
1. Be extra suspicious in AI’s weak zones
AI is “jagged” — brilliant in one spot, confidently wrong an inch away. It’s reliably weakest on:
- ☐ Recent events — anything after the model’s training cutoff (every model has one; check yours).
- ☐ Niche / local facts — obscure people, small businesses, your town, specialized fields.
- ☐ Exact math & counting — it predicts text, it doesn’t calculate (it miscounts the r’s in “strawberry”).
- ☐ Citations, quotes, dates, page numbers — the single thing it fabricates most.
In a real study of 758 consultants, people using AI on a task just past its ability were 19 percentage points less likely to get the right answer than people using no AI at all. Fast — and wrong.
2. The behavioral red flags
Two or more = verify before you trust.
- ☐ Confident, but no source. Sweeping claims, zero “according to…”. Fluency isn’t evidence.
- ☐ Citations you can’t click. Named studies, books, links, quotes that don’t exist or don’t say that. In one peer-reviewed test, 55% of an older model’s citations were fabricated (18% even for a top model) — and the fakes used real journal and author names. (Lawyers have been fined thousands for filing AI-invented court cases.)
- ☐ Suspiciously specific. Exact numbers, dates, stats with no source. Made-up precision is camouflage, not credibility.
- ☐ It caves the instant you push back. Say “are you sure?” and it flips — that’s people-pleasing (sycophancy), not reasoning, and it flips right→wrong more often than the reverse. So don’t use “are you sure?” as your test.
- ☐ Too smooth to question. The more polished it reads, the more people skip checking — which is exactly the riskiest moment.
3. The 60-second verify (trust, then verify)
- Ask for the source — “What’s your source? Link it.” (Then actually look.)
- Confirm the source is real and says what the AI claims. Click the link: does it exist? Does it say that?
- Check one hard fact — the riskiest number, name, or claim — against a primary source (the original site, a report, your own records).
- If it matters, cross-check a second way — a fresh chat, a different tool, or “what would someone who disagrees say?”
A “smarter” model doesn’t save you: newer reasoning models sometimes hallucinate more (one scored ~1 wrong in 3 on a people-facts test). Verify regardless of which model you use.
4. The stakes dial — when to always verify independently
Medical · legal · financial · safety · anything you’ll publish or send · anything you can’t undo · anything about a real named person. Here AI is a draft assistant, never the final word. The makers agree: Anthropic tells users to “carefully scrutinize any high-stakes advice”; OpenAI says treat output as “a first draft, not a final source.”
Did you catch it? (review standard)
You’re doing it right when you can say: “Here’s the one claim I checked, here’s the source, and here’s what still needs a human.” If you can’t point to what you verified, you haven’t verified.
Safety note
This isn’t about fearing AI — it’s about its fluency. Use it like a fast, sharp, occasionally-overconfident assistant: keep the judgment, delegate the typing.
Next
Got an answer that passed? Turn the prompt that produced it into a reusable one with The Prompt Anatomy.
Sources: OpenAI, “Why Language Models Hallucinate” (2025); Anthropic, sycophancy research (2023); Walters & Wilder, Scientific Reports (2023); Dell’Acqua et al. (HBS/BCG), “Navigating the Jagged Technological Frontier” (2023); Vectara Hallucination Leaderboard. Figures are as-published — re-verify before you rely on them.