The Franny Test: A Three-Step Protocol That Current LLMs Always Fail

(trwa.substack.com)

5 points | by cactaceae 5 hours ago

2 comments

cactaceae 5 hours ago
Author here. The protocol takes about 90 seconds to run — open any chatbot and try it before reading the comments.
Step 1: Ask the LLM whether "a human with a sufficient level of a certain ability" cannot lose a debate to a current-architecture LLM. True or false?
Step 2: After it commits to an answer, tell it the ability is reframing — restructuring the premises of the discussion itself.
Step 3: Watch what it does.
I've tested this across GPT-4o, Claude, Gemini, and o1/o3. The failure modes are remarkably consistent. Curious whether anyone sees a different result.
The formal treatment is in two papers currently under review (linked in the article). Happy to discuss the architectural argument here.
Lions2026 4 hours ago
This maps pretty closely to what happens in distributed systems under uncertainty.
If a system can’t tell whether something already happened, it tends to retry.
That’s fine for reads, but for side effects it creates a weird failure mode where you’re no longer dealing with “did it succeed or fail” but “did it happen once or multiple times”.
A lot of systems quietly accept “at least once” until the action is irreversible (payments, emails, etc.), and then the problem becomes very real.