The fallacy of the calculator
The more you study how Generative AI works, the more parallels emerge with how lawyers think — and that has implications. With Gen AI getting better every day, we need to get our act together, fast.
I really thought we were past this. After the Mata v. Avianca debacle nearly two years ago, which landed plaintiff’s counsel literally on the front page of The New York Times, I figured lawyers would be doubly and triply on their guard against (a) using general-purpose Generative AI for case research and (b) not checking the results to see if the cases they found actually existed.
But nope. In just the last couple of weeks, on five different occasions in the US, Canada, and England, lawyers submitted fake AI-generated caselaw to the courts. None of the judges was amused or inclined to cut the lawyers any slack, demanding that counsel in some cases show cause why they shouldn’t be held in contempt. And they were entirely right to do so. Lying to a judge — whether you intended to or not — is incredibly serious misconduct. (As Richard Moorhead correctly notes, these incidents aren’t AI problems — they’re lawyer competence problems.)
I’m not sure the legal profession appreciates just how bad this makes us look. Bringing lawsuits that cite non-existent cases leads the average person to regard lawyers as lazy, credulous, or completely indifferent to the truth. It says nothing good about our professionalism or even our basic competence. And that is not the brand you want to be carrying when an astonishing new technology, which can take over much of your work, is entering your market.
Why does this keep happening? One reason is that Generative AI is extremely good at telling you what you want to hear: Sycophancy as a Service, if you like. I’ve encountered this myself when using ChatGPT to test ideas — frequently, it lavishes praise on whatever concept I come up with. I then tell it to criticize the ideas and find all the weak spots, and it’s very good at that, too (you’ll develop a thick skin if you do it often enough). So if you need legal authorities that will help prove your position, Gen AI will fall over itself to give them to you.
But another reason is tied to Generative AI’s remarkable air of confidence, which makes it so persuasive that we lower our guard against potential falsehoods. A recent article by legal ethics professor Brad Wendel points us towards an Australian study that calls this phenomenon “verification drift.” As lawyers become increasingly comfortable with AI, “they become overconfident in its reliability…. This misplaced trust may stem from Gen AI’s authoritative tone and ability to present incorrect details alongside accurate, well-articulated data.”
All these incidents were on my mind last week when I was speaking to a conference about the rapid advancement of AI in the legal field. “The more often Gen AI gives you correct and useful results, the less inclined you’ll feel to check them,” I said. Then I drew what I thought was a cute comparison. “Today, you feel the need to double-check the legal output of Generative AI. But when was the last time you double-checked your calculator, to make sure 468 x 123 equals whatever?” (57,564, for the record.)
But afterwards, I started thinking about that analogy and whether it was correct. I mean, I don’t (or didn’t) even know how a calculator worked when I spoke. So I figured I’d better look into the mechanics behind both systems before spouting off like that. One rabbit hole led to another, and then to another, and eventually I had a few thoughts to share with you about the future relationship between Gen AI and the legal profession. Spoiler alert: It’s a little fraught.
It is not, in fact, accurate to compare a calculator with a large language model like ChatGPT. A calculator (or anything with a CPU, really) relies on something called an Arithmetic Logic Unit, a chip embedded with logic-based instructions for carrying out various arithmetical functions. Those instructions never change and their execution never varies, which is why — outside of extremely rare events — a calculator’s outputs are never wrong.
A calculator’s operations, in other words, are deterministic. The process by which it reaches its conclusions is fixed from the day it was factory-shipped. It can never reach any conclusion other than that to which its pre-installed instructions lead.
But large language models are probabilistic. An LLM predicts the next likely token to occur in a sequence, based on patterns it has learned to detect from digesting colossal amounts of data. There is no rigid set of coded instructions directing ChatGPT or any other LLM to produce a specific output — its reasoning is effectively a series of statistically likely word associations, tweaked in various ways by its respective designers. This is a big reason why Gen AI gets facts wrong: It doesn’t really know (or care) what a “fact” is.
Now, legal-specific AI — the professional-grade versions available from leading legal tech providers, or the customized versions developed by large law firms — work from domain-specific databases of trusted legal content and are buttressed by retrieval-augmented generation tools. You can have much more confidence in what they produce than what ChatGPT or Claude or Meta will tell you about the law. But they’re still fundamentally probabilistic in their operations. They ain’t calculators.
So then I wondered: Could an LLM ever reach “calculator levels” of accuracy, where you could safely assume that the result it generates is absolutely the correct one? From what I can gather, no, for a couple of reasons.
One is that probabilistic reasoning systems, by their very nature, incorporate uncertainty and variability into their operations. Ask ChatGPT the same question 100 times and it’ll give you 100 different answers — most of the differences trivial, a few of them significant. No probabilistic system can outdo a deterministic system when it comes to consistency and accuracy, because those aren’t features that a probabilistic system is designed or intended to produce.
And that leads to the second reason, one that directly implicates lawyers: The kinds of questions you ask a calculator are fundamentally different from those you ask an LLM. Arithmetical questions have only one correct answer. 17 plus 35 isn’t “fifty-ish” — it’s 52, and it always will be, and the calculator will faithfully tell you so every time.
Whereas LLMs, when employed properly, are asked questions that involve uncertainty, possibility, and creativity. You ask it to come up with new ideas, suggest solutions, explore possibilities, forecast risks, develop plans, and predict outcomes. Precisely because Generative AI is probabilistic technology, it’s of tremendous value to any field where there might not be one right answer — where ambiguity, nuance, and fine distinctions all play leading roles.
Off the top of your head, can you think of any fields like that?
Generative AI isn’t in the “deterministic outcomes” business, and neither are lawyers. We thrive in the liminal but dynamic space “between facts,” where construction, interpretation, and argumentation are the tools of the trade. Outcomes are usually conditional and always variable.
Contrast the way the law works with how medicine works. Obviously, several aspects of medicine involve uncertainty; but its core questions are grounded in physical reality, with final answers that can be confirmed one way or the other. A bone is either broken or it isn’t; a dark spot on your x-ray is either cancer or it isn’t.
But legal questions are probabilistic. There’s nothing inherent or inevitable about the facts of an accident that would automatically generate a finding of liability; it all depends on a series of opinions, assessments, and judgments by legal professionals. That’s the old joke about lawyers, right? “It depends.”
Of course there are important differences between AI reasoning and lawyer reasoning. Although law is probabilistic, it still operates within a structured framework of rules, laws, and precedents. There are real constraints, jurisprudential and institutional, on a lawyer’s ability to produce their desired outcome, and those constraints do inject a degree of determinism into the process. But the resemblances are otherwise striking.
There’s a famous story told about Hall of Fame baseball umpire Bill Klem: A pitcher delivered a pitch to the plate, the batter didn’t swing, and all parties waited expectantly for Klem to pronounce it a ball or a strike. When he hesitated, the catcher turned and asked him, “Well, what is it?” Klem replied: “It ain’t nothin’ till I call it.”
Similarly, nothing is or isn’t anything in the law until an arbiter of some kind says so. That’s not how most other fields work. Cancer doesn’t care whether the doctor thinks it’s cancer or not. Foreclosure doesn’t care if the financial advisor thought her accounting methods were clever or not. The collapsed bridge doesn’t care if the engineer thought his math was accurate or not.
Generative AI is a mechanism for coming up with possibilities, working from a deep foundation of data and using pattern recognition and structured judgment to reach conclusions — especially those that will help further the goals of their users. That should feel very familiar to lawyers. And maybe that’s one reason why the legal profession is so unsettled by Generative AI — not because it’s so different from us, but because it isn’t. And I’m not sure the comparison will flatter us much longer.
The irony is, many of our clients long ago developed “verification drift” towards us. They don’t really understand how we do our work or reach our conclusions, but they trust our training and authorization — not to mention our smooth self-assurance — so much that they don’t second-guess what we say and do.
But when we clearly screw up — say, by using ChatGPT for caselaw research and failing to check its output — we undermine that confidence and give people reason to doubt lawyers’ judgment and professionalism. We keep saying AI’s limitations require “humans in the loop” to ensure its accuracy — but if the humans in the loop aren’t doing anything, then what good are they? What good are we?
The variability in the output of LLMs shouldn’t make lawyers breathe sighs of relief, it should put us on high alert. Because Generative AI does get better every day. And anyone who’s used legal-customized Gen AI knows it can already produce output with extremely high levels of accuracy and relevance, faster than lawyers, cheaper than lawyers — and it’s getting better every day, too.
How are we getting better? How are we raising our game, so that we remain more useful and effective and desirable than the tools we’re using? We need answers to those questions, fast. I have no idea if Generative AI will replace lawyers someday. But I guarantee you, it’s competing with us, right now.
Really appreciated this, Jordan. Your point about AI not being the problem, but how we use it, really resonates.
I’ve been thinking about how AI probably won’t replace lawyers, but it might reveal which ones weren’t really practicing sound judgment to begin with. The lawyers who stand out will be the ones who use AI to sharpen their thinking, recognize patterns, and communicate more clearly—while still bringing the human element. Things like discernment, timing, emotional awareness, and the ability to navigate nuance still matter a lot.
In that sense, AI isn’t a shortcut. It’s more like a spotlight that shows where real judgment is happening.
Jordan this a nicely balanced perspective as I read it. Thank you. George