I took part this week in a podcast called Will Judges Think It Is Okay To Use Clustering and Suggestive Coding Tools? which was led by Karl Schieneman of ESI Bytes. I was the token Englishman alongside US top-drawer participants Judge Grimm, Judge Facciola, and Maura Grossman of Wachtell, Lipton, Rosen & Katz, who is also Topic Authority in the Legal Track of the Text REtrieval Conference (TREC).
As its title implies, the podcast concerned the acceptability of technology like clustering and what is variously called “suggestive coding” or “predictive coding”. Karl used the term “suggestive coding” so I will stick with that. These technologies aim to reduce the volumes which must be subjected to this most expensive (and most inaccurate) method of making document decisions. The starting point, for a lawyer or a judge, is the need for competent, ethical, co-operative and proportionate discovery, and a recognition of the role which technology must play in this. It is technical stuff, as Judge Grimm observed at the outset of our podcast, referring approvingly to Judge Facciola’s observations in US v O’Keefe about what judges and lawyers may dare opine in the face of technological complexity and about angels fearing to tread. The volumes, the technology and the expected standards have all increased substantially since then.
You can listen to the podcast yourself, so I will not do more than list some key points which came out:
Distinguish between clustering, which simply points you to material with like content, and suggestive coding which takes lawyer input on a selection and applies it intelligently across a larger (perhaps the whole) document set.
The value of clustering follows from the fact that with discovery / disclosure exercises, unlike with most database searches, you do not know what you are looking at, still less what you may be looking for. It addresses the question ‘What have I got here?”, which is very different from “What have I got about X?” or “What is the answer to Y?”.
Attenex used to have a tagline which went “Documents define themselves and find their friends” which illustrates what clustering offers. The lawyer then works from that bucketing of similar documents.
What happens if your use of cluster (or any other) technology finds documents which would not have been found by the keyword list agreed with the other side? Judge Facciola’s strong view was that such documents would be discoverable if they were embraced by a request, even if the actual keywords would have missed them. He accepted that there could be different opinions about this, but thought that it would be a “very dishonest way to conduct yourself” to omit them on that technical ground.
Why might one need to involve the court? Judge Facciola said that one might approach the judge saying “our opponent won’t listen to us ? Will you listen to us?”. The more that you educated judges, the more the judge might ask “why are we doing this at all?”
Judge Grimm said “Technology will always outpace us”. Someone has to have the courage to go first, he added.
How does predictive coding work? The precise mechanics vary, but lawyers look at a random sample of documents and create a seed set, or mark documents as to be included or excluded. The system finds others like them – documents with similar terms or structure or involving the same people – and flags them accordingly. You can refine that continually in the same way as a spam filter does as it learns. The software than ranks all the documents; you might set the most expensive reviewers to do the top slice, do some sampling at the bottom and involve cheaper people in the middle.
Quality control, Maura said, was vital.
Judge Facciola said that a judge might say of any proposed method “Convince me that I should not take one more step”. To me, that is part of the new skill-set required of lawyers – they must learn not merely to evaluate one approach versus others, but to argue why theirs is the most appropriate approach. If I were a Special Master (a role Karl invited me to play for the purposes of the recording), I would want to know the cost as well as the search implications of rival approaches; the burden passes to the party who argued for a more expensive route, I suggested.
I described briefly the UK system in so far as it related to our subject. One primary difference is that for standard disclosure, the parties decide for themselves what is to be disclosed without a request. The ethical point discussed earlier by Judge Facciola does not therefore arise in quite the same way, since your duty to disclose is not defined by the way your opponents have framed their request.
The new UK eDisclosure Practice Direction requires parties to discuss their sources and the “tools and techniques (if any) which should be considered to reduce the burden and cost of disclosure of electronic documents”. This includes agreed keyword searches, agreed software tools and data sampling. This is the time at which parties should resolve any differences between them, submitting to the judge a list of those things which were agreed and those which they had been unable to agree.
I commend this podcast to you whichever jurisdiction you work in. The principles – technology validation, cooperation, and giving the judge enough information to make proportionate decisions – apply equally on both sides of the Atlantic.
You may also be interested in a video interview involving me and Senior Master Whitaker, recorded by Project Counsel at IQPC in Munich in December. My questions to Master Whitaker include the same one about judicial acceptance of the use of technology, and his answer complements what the US judges say in the podcast.
My thanks to Karl Schieneman and ESI Bytes for the opportunity to take part in this otherwise all-American event.