Electronic discovery company Applied Discovery and KPMG are amongst those who have recently partnered with Equivio to integrate Equivio>Relevance into their existing eDiscovery applications. These two recent announcements give me an opportunity to return to the subject of software-assisted document review using what is generally known (but see below) as predictive coding. Recent discussions with some lawyers have shown scope for fundamental misunderstandings about what this kind of software does, and a look at the explanations produced by Applied Discovery and KPMG, as well as those of some other players in this space, may help.
My primary objective is clarity for those who come across the names and the terminology but are not necessarily clear as to the functionality being offered or its purpose. There seem to be three aspects which some lawyers find difficult about this kind of software, each of which is worth challenging. These are the following:
- They think it is being sold as a substitute for human review of the documents which are to be disclosed. That is not its purpose. It provides (amongst other benefits) a means of identifying irrelevant (or less relevant) documents or, to put it the other way round, a way of prioritising documents so that those identified provisionally by the software as the most relevant or most important are brought to the top. The important word here is “provisionally”, with its clear implication that the lawyers get every opportunity to double-check both what has been ranked as important and what has not.
- It is the subject of debate about judicial acceptability which, again, follows from a misunderstanding both of what it does and of what the courts expect. You may care to read an article called Judge Peck and Predictive Coding at the Carmel eDiscovery Retreat which reports on what I describe as “one of the clearest statements yet by a judge that the use of new technology like predictive coding is an acceptable way to conduct search”. Judge Peck’s immediate context may have been the US Federal Rules of Civil Procedure, but the principles which he covers apply anywhere, and the UK’s Senior Master Whitaker has said much the same both in conference speeches and in his judgment in Goodale v Ministry of Justice. Master Whitaker has also emphasised repeatedly the first point made above – that none of this software is intended as a substitute for human review of documents to be disclosed; he has heard this reaction as well.
- There is a paradox inherent in the nature of lawyers. They crave certainty but prefer it expressed in words rather than in numbers (a point which comes up in my recent report about the risk management function of corporate counsel). The statistics-based evidence of an application’s accuracy may underpin a decision to use this type of software, but lawyers still prefer the evidence of their own eyes. In fact, the applications give every opportunity for output to be validated by humans, but (in a second paradox) this may be getting lost in the marketing literature’s emphasis on the science.
The second of these points, judicial acceptability, is adequately covered in my Judge Peck article. In this article, I will focus on the lawyers’ own confidence in these applications as an aid to lawyerly judgement, not a substitute for it. To do that, I will look at the descriptions of what these applications do rather than on the science behind them, supporting the relevant parts of the Applied Discovery and KPMG materials and by extracts from those of other products with similar functionality.
First, however, it is worth saying a few words about terminology, not so much to define the labels as to pick the generic purpose out from the proprietary or product-specific names given to this kind of functionality. They all work slightly differently, and I defeat my object of simplicity if I qualify my deliberately broad descriptions with any attempt to describe the distinguishing features of each of them.
Predictive Coding and its kin
The broad class of software involved here has acquired the label “predictive coding” which is taken to imply that document decisions made by one means, and on a subset of documents, are applied across a larger set by sophisticated technology which takes all the characteristics of the selected documents to find others which are “like” them.
The word “like” in this context connotes a very wide range of characteristics and not merely those which caused the selection of the original set in the first place. That seed set may have been picked by humans making Yes / No decisions about randomly-selected documents, the randomness ensuring that no preconceptions inform the choice; they may be documents which have responded to a particular keyword search or been grouped together by the technology known as clustering. The purpose is to leverage (no, I haven’t gone native, but that particular Americanisation of a noun into a verb is useful) the primary input to find “more like this”.
If I appear to be pussyfooting around the term “predictive coding”, that is because Recommind obtained earlier this year a patent on its own technologies with the name “Predictive Coding”, which sets up a conflict with those who, like Equivio, use those words in a more generic sense. I am concerned only that prospective users understand the concepts. My article A Flock of Articles on Computer-Assisted Document Review summarises the summer’s debates on the subject and points back to other articles.
I begin with this recital of the background to make it clear that, however important the IP battles are (and they clearly are important) the de facto position is that there is a sophisticated class of technology which, whatever it is called, whatever its interface looks like, and whatever the underlying algorithms, begins with a seed set or with some other starting-point and identifies documents with similar characteristics. It does not impose that wider selection but merely suggests it – indeed, FTI Technology’s equivalent function (generally packaged with consultancy services within its Acuity offering) is called Suggested Coding which explicitly recognises this role.
Applied Discovery Leverage
Applied Discovery has branded its use of Equivio->Relevance as Leverage which, whatever my views on the word itself, aptly describes the jump-up which this technology brings. Its press release is here and the main product page is here. Both offer clear descriptions of what users can expect from the new implementations.
Applied Discovery gives the name predictive tagging to its implementation of Equivio>Relevance. The key paragraph in the press release is perhaps this one:
Review teams code core sample sets of documents, while Equivio technology analyzes those tagging decisions along with the documents’ content. Predictive tagging then applies that human logic across the entire data set. The result: the most relevant documents are quickly grouped together for prioritized, contextual review. The integration with the Applied Discovery Leverage™ suite allows clients to use Equivio in a variety of ways, including to validate human tagging as a means of quality control, to parse large data into more easily reviewable issue sets to assign to the most knowledgeable reviewers, and to eliminate non-relevant data from manual review.
Here lies part of the answer to those who picture predictive coding / predictive tagging as a kind of sorcerer’s apprentice, invisibly arrogating to itself tasks which properly belong to the lawyer. The Equivio component sits alongside the more conventional search technologies which Applied Discovery has always had, and can be invoked as and when the user thinks it appropriate. This also means that one approach can be used to cross-check and validate another.
Hammering the point
This point is fundamental to an understanding of all the iterations of this kind of technology, and thence to ideas about its acceptability to courts. It is worth hammering, which I will now proceed to do by reference to the explanations of some other providers before coming back to the KMPG example.
Epiq Systems were the first to integrate Equivio>Relevance, making it part of a mixed software and services offering called IQ Review. Their web site includes a clever graphic which shows the stages through which documents may pass (I say “may” advisedly – you do what you need to do for a case). Start here with the section called Analyze and observe the intertwining of technology and human input over the three sections Analyse, Prioritize, Review. As with the Applied Discovery quotation above, the emphasis is on pushing the most important material under the eyes of reviewers. It is for humans to decide when documents have so far declined in importance that they are not worth reading, and to do whatever they feel they need to do, by sampling or whatever, to check that result.
Now look at the equivalent part of Recommind’s site, the description of its Axcelerate Review and Analysis. Again, it is clear that the predictive coding element is one of several tools available to a user, each of which can be used to cross-check the results of another with “the option of reviewing some, most, or all of a collection in a fully defensible fashion”. As with the other products mentioned here, you certainly CAN use its prioritisation functions to rule out the need to look below a certain level, but the application gives you that choice – you are winning as soon as you have a starting point of documents which appear to be most important and most relevant. Bear in mind that user input plays as big or as small a part as you wish in arriving at the definition of relevance, qute apart from the ability to cross-check the output.
Now look at kCura’s Relativity and specifically at this page which describes its equivalent technology. The reiterated stress is on human decisions magnified by the system, with constant opportunities to cross-check the results and to feed further human decisions and corrections into the mix – see, for example, this:
Once Relativity has classified a collection of documents based on a review team’s decisions, administrators can QC Relativity’s work, either agreeing with Relativity or changing the decision. Relativity will learn from this input to get smarter and improve results.
None of these providers, nor any of the others not mentioned here, claims that its products are intended as a substitute for manual review of the ultimately disclosable material, nor do they imply that their QA tools just get on and do that vital job for you – they are provided to allow lawyer checking all along the way.
KPMG’s Discovery Radar
Providers like KPMG offer a range of applications to their clients, allowing them to propose a solution appropriate to the case. They are, for example, major providers of Clearwell. They also have their own application called Discovery Radar and it is this which now integrates Equivio>Relevance.
My chief interest in this is the extremely helpful booklet called Software-assisted document review: an ROI your GC can appreciate which KPMG has published in conjunction with Equivio. Equivio’s own website is a model of clear explanation and includes several papers (including one by me) which explain various aspects of eliminating redundant data. The KPMG booklet, heavily illustrated with tables, graphs and screenshots, is the clearest explanation yet of the value of prioritisation software, whether it comes via KPMG or anywhere else.
Whatever you call this kind of technology, there is no doubt of its place in the battle to deal proportionately with large volumes of documents. One of the players in this market reckons that it has a value for as few as 4000 documents. Not every case warrants this degree of sophistication and nothing said here implies as much. There are simpler tools which will do the job more than adequately for many cases; outsourced document review is a growing business as an alternative or (in many cases) a complementary approach. All the providers mentioned here, and most others, adopt a consultative approach to the use of their applications.
The primary purpose of this article is to suggest that computer-assisted review, whether you call it “predictive coding” or anything else, ought to be considered alongside other methods. If you decide against it, at least do so on the basis of a proper understanding of what it does.