I don't understand how this is possible at all at Anthropic. Couldn't they, like, embed an agentic swarm into their backend that prevents any errors from ever making it into production? What am I missing?
I've spent the last couple days building out an automated classifier on top of the batch API, and just this morning (about a minute before the outage began!) started running my first live tests. I thought I was going mad!
I asked it to add debug logs. Asked it to check debug logs. Stuck, locked me out.
Checked debug file, it was 1.6GB (!!!)
For a while I thought I was kicked out of the platform for violation of some T&C.