There will be many more things like this and it’s an elephant in the room for the supposed mass replacement of people with AI.
Some human still has to be accountable. Someone has to get fired / go to jail when something screws up.
You can make humans more productive but for the foreseeable future you can’t take the human out of the loop to have an AI implementation that’s not a disaster/lawsuit waiting to happen. That, probably more than anything else, is why companies just aren’t seeing the much promised mass step change in productivity from AI and why so many companies are now saying they see zero ROI from AI efforts.
The lowest hanging fruit will be low value rote repetitive tasks like the whole India offshoring industry, which will be the first to vaporize if AI does start replacing humans. But until companies see success on the lowest of lowest hanging fruit on en-mass labor replacement with AI things higher up on the value chain will remain relatively safe.
PS: Nearly every mass layoff recently citing “AI productivity” hasn’t withstood scrutiny. They all seem to be just poorly performing companies slashing staff after overhiring, which management looking for any excuse other than just admitting that.
I think this is an even clearer case than usual. With software engineers and office work you don’t have legal limitations on who can perform the work, but they exist for lawyers and doctors for example.
So if this is a tool, the fault lies fully in the user, and if this is treated as “another persons work” then the user knowingly passed the work onto someone not authorized to do it. Both end up in the user being guilty.
Text coming out of an LLM should be in a special codeblock of Unicode, so we can see it is generated by AI.
Failing to do so (or tampering with it) should be considered bad hygiene, and should be treated like a doctor who doesn't wash their hands before surgery.
Or, AI is going to be like when land lines became unnecessary when cellphones showed up in India. India may get to skip an entire intellectual generation due to the ability of a cheap model to educate (in any language).
The narrative that an entire population are “worth” less, paid less , know less, live less …
Fuck this less shit, embrace the paradigm shift. God is finally providing the remedial support through the miracle of AI.
You are making a lot of assumptions here. You assume, among other things, that AI has self-preservation drive, can be threatened, can be motivated, and above all that we know how to accomplish that and are already doing so. I would dispute all of that.
But just as evolution in nature, isn’t it likely that in the future the AIs that have a preservation drive are the ones that survive and proliferate? Seeing they optimize for their survival and proliferation, and not blindly what they were trained on.
I am not discounting this happening already, not by the LLMs necessarily being sentient but at least being intelligent enough to emulate sentience. It’s just that for now, humanity is in control of what AI models are being deployed.
Isn't just the issue stemming simply from not using the right tool? When the stakes are high and you should be checking details, the right tools are grounded Ai solutions like nouswise and notebooklm and not the general purpose chatbots that almost everyone knows they might hallucinate. I also do believe that this use case is definitely a low hanging fruit to automat a lot of manual work but it comes with new requirements like transparency to help with verifying the responses.
> She had no intention to misquote or misrepresent the rulings and that "the mistake occurred solely due to the reliance on an automatic source", the high court wrote
I don't think the intention matters here. Its the same deal with every profession using llm to "automate" their work. The onus in on the professional, not the llm. Arstechnica case could have been justified by same manner otherwise.
Not knowing the law isnt execuse to break law, so why is not knowing the tool an excuse to blame the tool.
Using an LLM to automate is simply the newer cheaper outsourcing with much of the same entertainment, but less food poisoning and air travel.
Over the last 20 years a lot of engineering (proper eng, not software) work in the west has been outsourced to cheaper places, with the certified engineers simply signing off on the work done elsewhere. This results in a cycle of doing things ever faster/more cheaply and safeguards disappearing under the pressure to go ever cheaper and faster.
As someone else pointed out, LLMs have just really exposed what a degraded state we have headed into rather than being a cause of it themselves. It's going to be very tough for people with no standards - they'll enjoy cheap stuff for a while and then it will all go away. Surprised Pikachu faces all round.
At least that's the story LLM labs leaders wanna tell everyone, just happen to be a very good story if you wanna hype your valuation before investment rounds.
Working with LLM on a daily basis I would say that's not happening, not as they're trying to sell it. You can get rid of a 5 vendor headcount that execute a manual process that should have been automated 10 years ago, you're not automating the processes involving high paying people with a 1% error chance where an error could cost you +10M in fines or jail time.
When I see Amodei or Sam flying on a vibe coded airplane is the day I believe what they're talking about.
LLMs also solve the timezone and language challenges. Sadly one problem that remains is that they too tell you they have understood something even if they haven't.
The issue is ultimately blaming people doesn't really solve things. Unless its genuinely a one-of-a-kind case. But if this happened once its probably going to happen again, and this isn't the first such case of LLM hallucinations in law.
It's weird to think this way, because its easy to just point at a person for a specific instance, but when you see something repeat over and over again you need to consider that if your ultimate goal is to stop something from happening you have to adjust the tools even if the people using them were at fault in every case.
They cannot even claim they weren't aware of the danger. LLM hallucinations have been a discussed topic, not some obscure failure mode. Almost every article on problems with AI mentions this.
This is why LLMs won't replace humans wholesale in any profession: you can't hold a machine accountable. Most of the chatbot experiences I have with various support channels always end up with human intervention anyway when it involves money.
Maybe true general intelligence would solve these issues, but LLMs aren't meeting that threshold anytime soon, imo. Stochastic parrots won't rule the world.
Even ‘true general intelligence’ (if we count humans as that) screws up frequently, sometimes (often?) intentionally for it’s own benefit - which is why accountability is such a necessary element.
If someone won’t be held liable for the end result at some point, then there is no reason to ensure an even somewhat reasonable end result. It’s fundamental.
Which is also why I suspect so many companies are pushing ‘AI’ so hard - to be able to do unreasonable things while having a smokescreen to avoid being penalized for the consequences.
> to be able to do unreasonable things while having a smokescreen
Maybe, but I feel like the calculus remains unchanged for professions that already lack accountability (police, military, C-suite, three letter agencies, etc.); LLMs are yet another tool in their toolbox to obfuscate but they were going to do that anyway.
Peons will continue to face consequences and sanctions if they screw up by using hallucinated output.
The pattern here isn't really about individual negligence — it's a systems design problem. We keep deploying LLMs into workflows where the failure mode is "plausible-sounding fabrication" and the downstream consequence is legal or institutional harm, then blaming the end user for not catching it.
The better question is why these tools are being integrated into judicial workflows without mandatory citation verification layers. The EU AI Act classifies judicial AI as high-risk and requires human oversight mechanisms specifically for this reason. India's Digital Personal Data Protection Act (2023) doesn't yet have equivalent provisions for AI in courts, which is the actual gap.
From an engineering standpoint, the fix is straightforward: any LLM-assisted legal research tool should require grounded retrieval (RAG against verified case law databases) with mandatory source links that the user must click through before citing. The fact that most legal AI tools still don't enforce this is a product design failure, not a user education problem.
I feel like this points out a very general problem with the law: it generates a lot of boilerplate text. Lawyers don't really read it; they skim it for the relevant bits.
Obviously lawyers should not be cheating with AI, especially when they don't even check it. But it does sound to me as if this is an opportunity to re-factor the process. We're carrying forward some ideas originally implemented in Latin, and which can be dramatically simplified.
I'm not a lawyer; I know this only in passing. And I am aware that there are big differences between law and code. But every time I encounter the law, and hear about cases like this, what I see are vast oceans of text that can surely be made more rigorous. AI is not the problem; it's pointing out the opportunity.
It doesn't matter, because any process that seems right most of the time but occasionally is wrong in subtle, hard to spot ways is basically a machine to lull people into not checking, so stuff will always slip through.
It's just like the cars driving themselves but you need to be able to jump in if there is a mistake, humans are not going to react as fast as if they were driving, because they aren't going to be engaged, and no one can stay as engaged as they were when they were doing it themselves.
We need to stop pretending we can tell people they "just" need to check things from LLMs for accuracy, it's a process that inevitably leads to people not checking and things slipping through. Pretending it's the people's fault when essentially everyone using it would eventually end up doing that is stupid and won't solve the core problem.
Even disregarding self driving features, it seems like the smarter we make cars the dumber the drivers are. DRLs are great, until they allow you to drive around all night long with no tail lights and dim front lighting because you’re not paying enough attention to what’s actually turned on.
This whole thing is silly, LLMs can automate reference validation.
If someone is a lawyer, accountant, doctor, teacher, surgeon, engineer etc, and is regurgitating answers that were pumped out with with GPT-5-extra-low or whatever mediocre throttled model they are using, they should just be fired and de-credentialed. Right now this is easy.
The real problem is ahead: 99.999% of future content that exists will be made using generative AI. For many people using Facebook, Instagram, TikTok, or some other non-sequential, engagement weighted feed, 50%+ of the content they consume today is fake. As that stuff spreads in to modern culture it's going to be an endless battle to keep it out of stuff that should not be publishing fake content (e.g. the New York Times or Wall Street Journal; excluding scientific journals who seem to abandoned validation and basic statistics a long time ago.)
Much of the future value and profit margins might just be in valid data?
Nope, and the article is about a judge. What's the point to incentive lawyers to carefully verify their references when they know the judge has no incentive to read them and can just make shit up anyway?
I'm continually amazed at how much faith people have in them. I guess since they can sound like people and output really authoritative and confident text it just overrides any skepticism subconsciously?
2) https://archive.org/details/nextgen-issue-26 as an example of how in the 90s we has rapid cycles of a new tech (3d graphics) astounding us with how realistic each new generation was compared to the previous one, and forgetting with each new (game engine) how we'd said the same and felt the same about (graphics) we now regarded as pathetic.
So yes, they do sound "authoritative and confident text it just overrides any skepticism subconsciously", but you shouldn't be amazed, we've always been like this.
It's mind boggling how much people claim to like LLMs when you would never design any other piece of software to operate like LLMs do. Designing a system that interact with the user through natural text creates an awful experience. It slows down every interaction as you dig through all the prose to get to the key information. It turns every computer interaction into a school math word problem.
What kind of AI is this that you constantly need a human to check its job? Do you think Jean-Luc Piccard had to constantly check the output of the Enterprise computer? No he didn't. If AI is not better than humans, then what the heck is the point? You might as well just use humans.
This is a big problem in the US and UK too. Lawyers are not technical at all and they need a robust system of governance, since currently they're (directly editing, not even diffing) documents with a chatbot which makes these mistakes inevitable. See https://insights.doughtystreet.co.uk/post/102mi96/38-uk-case...
There will be loads of papers and publications with fake citation. AI will be trained on these. In the end, we'll have more and more hallucinated information that true content on the internet.
Next token prediction and Hallucination as a bug. This should be of deep concern to all Frontier labs, who think Integrity and Trust is optional when LLMs are used this way in places where it's most important.
one should also consider that even with fake hallucinated AI situation, the productivity and correctness of the work produced by the culprit ( in general ) may still have been of higher quality then before AI regardless of the fails
Hard to believe when this judge apparently thought that outsourcing their — extremely confidential, sensitive, and important — work to a known unreliable tool was a good idea. And then further thought that they apparently did not even need to check the results.
In Australia, our universities are finding that a large proportion of Indian students have been using GenAI for cheating. Often they get away with it. I'm not saying that people other than Indian overseas students cheat, but it does seem more entrenched. I'd love to know why. It doesn't actually help in the long term!
How unserious/serious are the universities? Heard of diploma mills in Canada taking international students, letting them spend most of their time waiting at coffee shops and award them MBAs so they can be full time waiters and citizens.
>The number of international students studying in Australia totalled 833,041 for the January-October 2025 period
>The United States hosts the highest number of international students on record, with approximately 1.1 to 1.2 million
The US has 32% more students than Australia and 1121% more people. Imagine if the US took on 13 million foreign college students per year lol
It does help them in the long run, because it ensures they get to reside in australia. after 4 years they get permanent residence rights and benefits, etc
I imagine even a slight impediment in terms of being able to parse and express yourself in a language that you don't know as well as your mother tongue makes LLM usage much more tantalizing.
And not knowing the language quite as well as native speakers would also make you more likely to be discovered as having used an LLM to do coursework.
In the United States, cheating via AI is now rampant regardless of ethnicity. I know little of Australian Universities but I would assume it’s similar over there.
Indian students have embraced GenAI at a rate significantly higher than the global average, with nearly 90% of students in some surveys actively using these tools.
Government Policy and National Initiatives: The National Education Policy (NEP 2020) has shifted the focus toward digital literacy. The government has introduced AI as a skill subject for younger grades and launched programs like AI for All to promote nationwide awareness.
They are not there for the knowledge - knowledge is cheap and abundant. They are there for the credentials and subsequent potential access to offshore jobs.
The scary thing is that Indian juduciary is infamous for being incapable of tolerating any kind of criticism against it and not hesitating to put people in jail for "contempt" for just calling out corruption. Imagine the official courts of 1.4B+ people being run by such braindead narcissists, now unhindered with having to even pretend to do their jobs as they just offload everything to AI tools.
Some human still has to be accountable. Someone has to get fired / go to jail when something screws up.
You can make humans more productive but for the foreseeable future you can’t take the human out of the loop to have an AI implementation that’s not a disaster/lawsuit waiting to happen. That, probably more than anything else, is why companies just aren’t seeing the much promised mass step change in productivity from AI and why so many companies are now saying they see zero ROI from AI efforts.
The lowest hanging fruit will be low value rote repetitive tasks like the whole India offshoring industry, which will be the first to vaporize if AI does start replacing humans. But until companies see success on the lowest of lowest hanging fruit on en-mass labor replacement with AI things higher up on the value chain will remain relatively safe.
PS: Nearly every mass layoff recently citing “AI productivity” hasn’t withstood scrutiny. They all seem to be just poorly performing companies slashing staff after overhiring, which management looking for any excuse other than just admitting that.
So if this is a tool, the fault lies fully in the user, and if this is treated as “another persons work” then the user knowingly passed the work onto someone not authorized to do it. Both end up in the user being guilty.
Text coming out of an LLM should be in a special codeblock of Unicode, so we can see it is generated by AI.
Failing to do so (or tampering with it) should be considered bad hygiene, and should be treated like a doctor who doesn't wash their hands before surgery.
"Check and balance, except judiciary."
The narrative that an entire population are “worth” less, paid less , know less, live less …
Fuck this less shit, embrace the paradigm shift. God is finally providing the remedial support through the miracle of AI.
The turning point will be when threatening an AI with being unplugged for screwing up works in motivating it to stop making things up.
Some people will rightly point out that is kind of what the training process is already. If we go around this loop enough times it will get there.
But just as evolution in nature, isn’t it likely that in the future the AIs that have a preservation drive are the ones that survive and proliferate? Seeing they optimize for their survival and proliferation, and not blindly what they were trained on.
I am not discounting this happening already, not by the LLMs necessarily being sentient but at least being intelligent enough to emulate sentience. It’s just that for now, humanity is in control of what AI models are being deployed.
I don't think the intention matters here. Its the same deal with every profession using llm to "automate" their work. The onus in on the professional, not the llm. Arstechnica case could have been justified by same manner otherwise.
Not knowing the law isnt execuse to break law, so why is not knowing the tool an excuse to blame the tool.
Over the last 20 years a lot of engineering (proper eng, not software) work in the west has been outsourced to cheaper places, with the certified engineers simply signing off on the work done elsewhere. This results in a cycle of doing things ever faster/more cheaply and safeguards disappearing under the pressure to go ever cheaper and faster.
As someone else pointed out, LLMs have just really exposed what a degraded state we have headed into rather than being a cause of it themselves. It's going to be very tough for people with no standards - they'll enjoy cheap stuff for a while and then it will all go away. Surprised Pikachu faces all round.
(I'm pro AI btw, just be responsible.)
Working with LLM on a daily basis I would say that's not happening, not as they're trying to sell it. You can get rid of a 5 vendor headcount that execute a manual process that should have been automated 10 years ago, you're not automating the processes involving high paying people with a 1% error chance where an error could cost you +10M in fines or jail time.
When I see Amodei or Sam flying on a vibe coded airplane is the day I believe what they're talking about.
The issue is ultimately blaming people doesn't really solve things. Unless its genuinely a one-of-a-kind case. But if this happened once its probably going to happen again, and this isn't the first such case of LLM hallucinations in law.
It's weird to think this way, because its easy to just point at a person for a specific instance, but when you see something repeat over and over again you need to consider that if your ultimate goal is to stop something from happening you have to adjust the tools even if the people using them were at fault in every case.
So the judge was lazy, incompetent, or both.
(Sure, more honest would be "this tool makes stuff up in a convincing way")
Maybe true general intelligence would solve these issues, but LLMs aren't meeting that threshold anytime soon, imo. Stochastic parrots won't rule the world.
If someone won’t be held liable for the end result at some point, then there is no reason to ensure an even somewhat reasonable end result. It’s fundamental.
Which is also why I suspect so many companies are pushing ‘AI’ so hard - to be able to do unreasonable things while having a smokescreen to avoid being penalized for the consequences.
Maybe, but I feel like the calculus remains unchanged for professions that already lack accountability (police, military, C-suite, three letter agencies, etc.); LLMs are yet another tool in their toolbox to obfuscate but they were going to do that anyway.
Peons will continue to face consequences and sanctions if they screw up by using hallucinated output.
The better question is why these tools are being integrated into judicial workflows without mandatory citation verification layers. The EU AI Act classifies judicial AI as high-risk and requires human oversight mechanisms specifically for this reason. India's Digital Personal Data Protection Act (2023) doesn't yet have equivalent provisions for AI in courts, which is the actual gap.
From an engineering standpoint, the fix is straightforward: any LLM-assisted legal research tool should require grounded retrieval (RAG against verified case law databases) with mandatory source links that the user must click through before citing. The fact that most legal AI tools still don't enforce this is a product design failure, not a user education problem.
Obviously lawyers should not be cheating with AI, especially when they don't even check it. But it does sound to me as if this is an opportunity to re-factor the process. We're carrying forward some ideas originally implemented in Latin, and which can be dramatically simplified.
I'm not a lawyer; I know this only in passing. And I am aware that there are big differences between law and code. But every time I encounter the law, and hear about cases like this, what I see are vast oceans of text that can surely be made more rigorous. AI is not the problem; it's pointing out the opportunity.
It's just like the cars driving themselves but you need to be able to jump in if there is a mistake, humans are not going to react as fast as if they were driving, because they aren't going to be engaged, and no one can stay as engaged as they were when they were doing it themselves.
We need to stop pretending we can tell people they "just" need to check things from LLMs for accuracy, it's a process that inevitably leads to people not checking and things slipping through. Pretending it's the people's fault when essentially everyone using it would eventually end up doing that is stupid and won't solve the core problem.
If someone is a lawyer, accountant, doctor, teacher, surgeon, engineer etc, and is regurgitating answers that were pumped out with with GPT-5-extra-low or whatever mediocre throttled model they are using, they should just be fired and de-credentialed. Right now this is easy.
The real problem is ahead: 99.999% of future content that exists will be made using generative AI. For many people using Facebook, Instagram, TikTok, or some other non-sequential, engagement weighted feed, 50%+ of the content they consume today is fake. As that stuff spreads in to modern culture it's going to be an endless battle to keep it out of stuff that should not be publishing fake content (e.g. the New York Times or Wall Street Journal; excluding scientific journals who seem to abandoned validation and basic statistics a long time ago.)
Much of the future value and profit margins might just be in valid data?
Can they though with 100% accuracy and no hallucinations? Wouldn't you still need to validate that they validated correctly?
Easy? In the US you need house impeachment to fire a judge. In some countries judges are completely immune unless they are sentenced for crimes.
1) https://en.wikipedia.org/wiki/Clever_Hans
2) https://archive.org/details/nextgen-issue-26 as an example of how in the 90s we has rapid cycles of a new tech (3d graphics) astounding us with how realistic each new generation was compared to the previous one, and forgetting with each new (game engine) how we'd said the same and felt the same about (graphics) we now regarded as pathetic.
So yes, they do sound "authoritative and confident text it just overrides any skepticism subconsciously", but you shouldn't be amazed, we've always been like this.
LLMs just revealed what a decadent society we have setup for ourselves worldwide.
It’s likely happening to everyone.
https://arstechnica.com/tech-policy/2026/02/randomly-quoting...
Sound like extreme incompetence or laziness.
>The United States hosts the highest number of international students on record, with approximately 1.1 to 1.2 million
The US has 32% more students than Australia and 1121% more people. Imagine if the US took on 13 million foreign college students per year lol
It does help them in the long run, because it ensures they get to reside in australia. after 4 years they get permanent residence rights and benefits, etc
And not knowing the language quite as well as native speakers would also make you more likely to be discovered as having used an LLM to do coursework.
Government Policy and National Initiatives: The National Education Policy (NEP 2020) has shifted the focus toward digital literacy. The government has introduced AI as a skill subject for younger grades and launched programs like AI for All to promote nationwide awareness.