E-Discovery

Courts and judges embrace predictive coding, but is it really fixing discovery?

  •  
  •  
  •  
  • Print

Maura Grossman on a stairway

Photo of Maura Grossman by Adam Lerner

The Obama White House staff will have generated about a billion emails by the time a new president is ready to be sworn in. Given the inevitable litigation and public records requests related to these documents, almost all of the messages will need to be searched, reviewed and (possibly) produced for legal demands.

That’s the type of emerging litigation nightmare lawyers and technologists hope to address with predictive coding technology. With the ongoing, worldwide explosion in data, computer-assisted review is being touted as the answer to the rising cost and complexity of civil discovery. Technologists believe it can eliminate the time and inconsistency of human review by replacing human discretion with computers.

But while predictive coding solves some problems, it introduces new complications. And despite a spate of recent cases employing the technology, questions remain as to how effective it can be.

“A lot of lawyers have been blindly relying on predictive coding to get discovery done,” says Bill Speros, a Cleveland-based e-discovery consultant. “A lot of the claims are coming from vendors, and it’s easier to believe the hype than to ask what the man behind the curtain is doing.”

As the former litigation director for the U.S. National Archives and Records Administration and senior counsel at the Department of Justice, Jason R. Baron knows something about searching White House emails. In the 2000s, his legal team struggled to review 20 million Clinton-era White House emails for the case U.S. v. Philip Morris.

At the time, the task was nearly impossible, but he has come to believe that computer-assisted review is now the solution to these kinds of problems.

“Technology-assisted review has already advanced incredibly in less than a decade,” he says. “I am absolutely certain that a decade from now, artificial intelligence in e-discovery will be more sophisticated and powerful than we can even imagine.”

Still, despite the increased use of predictive coding in litigation, it can be an ill-defined and nebulous process. Consider that there are at least 30 different types of machine-learning-based classification tools. Given that different classifiers are optimally suited for different types of data, the quality of results will vary depending on the data involved.

In some cases, poorly implemented predictive coding protocols may cost more than manual review, especially if experts must be brought in to settle disputes. Or parties may demand larger volumes of documents than necessary because computers can review more data than humans. “If you don’t have successful negotiations at the outset,” says Speros, “the costs can quickly get out of control.”

Litigator Maura R. Grossman of Wachtell, Lipton, Rosen & Katz has co-authored several influential papers on the use of technology-assisted review in electronic discovery. One publication compared technology-assisted review to manual review, finding TAR to be quite effective.

“We found that certain kinds of TAR—specifically, active learning and rule-based approaches—can be as effective as or better than manual review,” she says. “But that doesn’t mean that anything called TAR is automatically better than human review in every circumstance. It is not a magic, one-size-fits-all solution.”

In the 2012 dispute Da Silva Moore v. Publicis Groupe, the use of predictive coding was first approved by a court; it was later upheld at the federal district court level. But as more reported cases approve of the technology in litigation, there is wide variance in guidance.

In one matter, Edwards v. National Milk Producers Federation, the court describes a straightforward process to find a control set of documents, as well as how to use that control set to train computers to find similar documents. The court even sets the statistical standard for judging whether the process is a success. In other cases, the courts take a hands-off approach, letting the parties and their vendors define the process.

To achieve the optimal performance from a TAR tool, attorneys have to think critically about the capabilities of the software and the anticipated workflow.

Some concerns are clear-cut and practical. For example, they must consider whether the tool works with spreadsheets, numbers, audio or other types of files found in a given document collection.

More complicated concerns include determining the best way to select documents for training the software.

“The technology is powerful, but there is no standardized process or checklist you can follow,” says Grossman. “While we don’t know all the right answers yet, the solution isn’t ‘anything goes.’ It is important to do controlled studies to learn what works best under what circumstances.”

Legal technology experts agree that courts must not uncritically accept the claims of predictive coding vendors or testifying experts. If the process is employed in a matter, legal teams must check the computer’s work using old-fashioned human judgment.

At present there are various flavors of technology-assisted review—and a lively debate among experts on which are best.

“Debate is healthy, but much of it is philosophical and should not be interpreted to mean the technology doesn’t work,” Baron says. “The technology will work as well as the processes and workflows the lawyers using technology can put in place.”

This article originally appeared in the February 2015 issue of the ABA Journal with this headline: “Predictive Coding Has Something to Prove: Courts and judges embrace it, but is it really fixing discovery?”

Give us feedback, share a story tip or update, or report an error.