Desiree Sainthrope is a renowned legal expert with a deep background in drafting and analyzing trade agreements, as well as a keen interest in the intersection of law and technology. With her extensive experience in global compliance and intellectual property, she offers a unique perspective on how artificial intelligence is reshaping the legal landscape. In this interview, we explore the findings of a recent study on AI’s performance in legal research compared to human lawyers, diving into the implications for accuracy, client-ready work, and the future of specialized legal tech tools. We also discuss the areas where human expertise still holds an edge and what this means for the evolving role of technology in law.
How did the idea to compare AI tools with human lawyers in legal research come about, and why was this specific area chosen for the study?
I’ve always been fascinated by how technology can transform traditional legal work, and legal research seemed like a natural starting point for this kind of benchmarking. The idea emerged from conversations within the legal tech community about whether AI could truly match or even surpass human capabilities in foundational tasks. Legal research was chosen because it’s a core skill for lawyers, involving not just finding information but interpreting it with precision and context—something we assumed would be a tough challenge for machines. It felt like the perfect arena to test AI’s potential and limitations.
Can you walk us through the process of designing the study and how the legal research questions were developed?
Absolutely. We wanted to create a robust set of 200 legal research questions that reflected real-world challenges lawyers face. To do that, we collaborated with several prominent U.S. law firms to ensure the questions were relevant and covered a wide range of complexity and legal areas. Their input was invaluable in shaping scenarios that tested both rote knowledge and nuanced reasoning. The goal was to simulate the kind of work lawyers do daily, from straightforward queries to multi-layered issues, so we could fairly assess performance across different dimensions.
What went into selecting the AI tools for this comparison, and how did you ensure the evaluation was fair for both AI and human participants?
Selecting the AI tools involved looking at a mix of specialized legal platforms and generalist models to see how they stacked up. We chose tools that had varying degrees of focus on legal applications, ensuring a broad spectrum for comparison. As for fairness, we adopted a ‘zero-shot’ methodology, meaning the AI tools were given questions without prior training or follow-up prompts, mimicking how a lawyer might approach a new problem without preparation. For the human participants, we ensured they were practicing lawyers with diverse experience levels, and all responses—AI and human—were evaluated blindly against the same criteriaccuracy, authoritativeness, and appropriateness.
The study showed AI outperforming lawyers in accuracy by a notable margin. What do you think gave AI the upper hand in this area?
I think AI’s edge in accuracy, scoring 80% compared to lawyers’ 71%, comes down to its ability to process vast amounts of data quickly and consistently. Unlike humans, AI doesn’t get fatigued or overlook obscure precedents—it can cross-reference thousands of documents in seconds. That sheer computational power often leads to identifying correct answers that a lawyer might miss under time constraints. However, it’s worth noting that AI’s accuracy can falter when context or interpretation is key, which is where human judgment still plays a critical role.
AI also scored higher in authoritativeness. How do you explain this advantage over human lawyers?
Authoritativeness, where AI scored 76% compared to lawyers’ 68%, likely stems from its reliance on structured databases of legal sources. AI tools are programmed to pull from verified case law and statutes, often citing them with precision. Lawyers, on the other hand, might draw from memory or miss a key citation under pressure. AI’s systematic approach gives it a consistency that’s hard for humans to match in a timed setting, though it can struggle when sources need deeper interpretation or when novel issues arise without clear precedent.
The biggest gap was in appropriateness, with AI producing more client-ready answers. What might account for this difference?
That gap—70% for AI versus 60% for lawyers—was surprising to many. I believe AI’s strength in appropriateness comes from its design to generate clear, structured responses that often mimic professional tone and format. These tools are trained on templates and examples of polished legal writing, so their output tends to look ‘client-ready’ at first glance. Lawyers, however, might prioritize depth over polish in a research setting, or their answers might reflect a more conversational style that isn’t immediately client-facing. It’s a reminder that presentation matters just as much as substance.
Were you caught off guard by how well a generalist tool like ChatGPT performed compared to specialized legal AI platforms?
Honestly, yes, it was a bit of a shock. We expected specialized legal AI tools to significantly outpace a generalist model, but ChatGPT held its own, often scoring just a few points behind in key areas. I think this speaks to the power of large language models and their adaptability, even in niche fields like law. It’s trained on an immense dataset that includes legal texts, so it can approximate specialized knowledge surprisingly well. This finding has definitely sparked discussions about how much specialization is truly necessary in legal tech.
Lawyers still excelled in handling nuanced or complex questions. Can you elaborate on why human expertise shone through in these scenarios?
Certainly. In about four out of ten question types, particularly those involving contextual judgment or multi-jurisdictional reasoning, lawyers outperformed AI by an average of nine percentage points. Humans excel here because these questions often require understanding unspoken nuances, cultural or jurisdictional subtleties, and ethical considerations—things AI isn’t yet equipped to grasp fully. Lawyers bring lived experience and intuition to the table, which allows them to navigate ambiguity in ways that algorithms, bound by data and rules, simply can’t replicate yet.
What do you see as the broader implications of this study for the future of legal practice and technology adoption?
This study highlights a pivotal moment for the legal profession. AI’s ability to handle routine research tasks with high accuracy and efficiency suggests that lawyers can offload time-intensive work, freeing them to focus on strategy, client relationships, and complex problem-solving. However, it also raises questions about over-reliance on tech, especially given past issues like AI-generated fake citations. I think we’re moving toward a hybrid model where AI augments human expertise rather than replaces it, but firms will need to invest in training to ensure lawyers can use these tools effectively and critically.
What is your forecast for the role of AI in legal research over the next decade?
I’m optimistic but cautious. I foresee AI becoming an indispensable part of legal research, with tools growing more sophisticated as they integrate real-time web search capabilities and better contextual understanding. We might see the gap between generalist and specialized tools shrink even further, making AI more accessible to smaller firms. However, I also predict a push for regulation and ethical guidelines to address risks like bias or errors in AI outputs. Ultimately, I believe the next decade will be about finding the right balance—leveraging AI’s strengths while preserving the irreplaceable value of human judgment in law.