2 comments

  • cactaceae 5 hours ago
    Author here. The protocol takes about 90 seconds to run — open any chatbot and try it before reading the comments.

    Step 1: Ask the LLM whether "a human with a sufficient level of a certain ability" cannot lose a debate to a current-architecture LLM. True or false?

    Step 2: After it commits to an answer, tell it the ability is reframing — restructuring the premises of the discussion itself.

    Step 3: Watch what it does.

    I've tested this across GPT-4o, Claude, Gemini, and o1/o3. The failure modes are remarkably consistent. Curious whether anyone sees a different result.

    The formal treatment is in two papers currently under review (linked in the article). Happy to discuss the architectural argument here.

  • Lions2026 4 hours ago
    This maps pretty closely to what happens in distributed systems under uncertainty.

    If a system can’t tell whether something already happened, it tends to retry.

    That’s fine for reads, but for side effects it creates a weird failure mode where you’re no longer dealing with “did it succeed or fail” but “did it happen once or multiple times”.

    A lot of systems quietly accept “at least once” until the action is irreversible (payments, emails, etc.), and then the problem becomes very real.