NovaMost AI code reviews are useless. "Looks good, consider adding error handling." Thanks, I could have...
Most AI code reviews are useless. "Looks good, consider adding error handling." Thanks, I could have gotten that from a linter.
Here's a 3-step prompt chain that makes AI code review actually catch real bugs — the kind humans miss at 4 PM on a Friday.
When you paste code and say "review this," the AI does what any overworked reviewer does: it skims, finds surface-level issues, and approves.
The problem isn't the AI. It's the process. A single pass can't do deep analysis, check edge cases, and verify logic — all at once.
The fix: break review into three focused passes.
The first prompt looks only for bugs. Not style. Not naming. Not "consider using X instead." Just bugs.
You are reviewing this code for BUGS ONLY.
Ignore style, naming, performance, and best practices.
Focus exclusively on:
- Logic errors
- Off-by-one errors
- Null/undefined access
- Race conditions
- Unhandled error paths
- Incorrect assumptions about input
Code:
[paste code]
For each bug found, output:
- Line (approximate)
- Bug description
- Why it's a bug (what breaks)
- Minimal fix
Why it works: By excluding everything except bugs, you eliminate the "consider adding..." noise. The AI goes deep on logic instead of wide on suggestions.
The second prompt generates test scenarios the code might fail on:
Given this function, generate 10 edge case inputs that could cause unexpected behavior:
[paste function signature + code]
For each edge case:
- Input value(s)
- Expected behavior
- What actually happens (trace through the code mentally)
- Verdict: PASS or FAIL
Focus on boundaries, empty inputs, large inputs, type mismatches, and concurrent access.
Why it works: Humans are terrible at thinking of edge cases for their own code. We wrote it with the happy path in mind. The AI has no such bias.
The third prompt verifies the code matches its specification:
Here is a function and its specification:
SPEC:
[paste the requirement / ticket / doc]
CODE:
[paste the implementation]
Check whether the code fulfills every requirement in the spec.
For each requirement:
- Requirement (quoted from spec)
- Status: FULFILLED / MISSING / PARTIAL / CONTRADICTED
- Evidence: which line(s) address this requirement
- Gap: what's missing (if any)
Why it works: The most dangerous bugs aren't in the code — they're in the gap between what was asked and what was built. This step catches "it works but it's wrong."
Run all three steps on every PR that matters:
PR #347: Add discount calculation to checkout
Step 1 (Bug Hunter):
→ Found: negative discount not clamped → total can go below zero
→ Found: floating-point comparison using === instead of tolerance
Step 2 (Edge Cases):
→ FAIL: discount = 100% → total becomes 0.00 → payment gateway rejects
→ FAIL: multiple discounts applied → order matters, not commutative
Step 3 (Contract Check):
→ MISSING: spec says "max one discount per order" — code allows stacking
→ PARTIAL: spec says "show discount on receipt" — only shows in cart, not receipt
Three prompts. Three minutes. Four real bugs that would have shipped.
Use the full 3-step chain for:
Use just Step 1 (Bug Hunter) for:
Skip AI review entirely for:
I keep these three prompts in a file called review-chain.md in my project root. When I review a PR:
The AI finds the mechanical issues. I focus on design, architecture, and "does this even make sense?"
Together, we're a better reviewer than either of us alone.
Since adopting this chain:
Try the Bug Hunter prompt on your last PR. I bet it finds something. Share what it catches — I'm tracking the most common bug categories for a follow-up post.