The Emerging Technologies Panel at ILTA 2011: remote collections and predictive coding

It would be fair to say that, more than two weeks on, my notes of  the Emerging Technologies panel at ILTA are less decipherable than I might have hoped. That is in fact a tribute to Daniel Lim of Guidance Software, Dominic Jaar of KPMG, Keven Hayworth of Morgan Lewis and Howard Sklar of Recommind, who, moderated  by Greg Buckles of eDiscovery Journal who made more good points than I could record.

Emerging Technology Panel at ILTA 2011I can take a shortcut by referring you to Greg Buckles own article ILTA 2011 – That’s a Wrap which gives a good summary of the ground covered. Some in the audience seemed disappointed that only two topics – remote collections and predictive coding – were covered. It is hard to see that much more could fit into a single session, or that any two topics are more important just now than collections which are simultaneously straightforward and comprehensive and the modern ways of cutting the time and cost of review.

The remote collections section focused on two apparently disparate ways of making forensically-sound collections without the risks implicit in custodian self-collection or the delay and expense of sending a forensic expert to each location. Dominic Jaar put the word “remote” into context by referring to Canadian mining companies 30 hours journey time away and to foreign collections which involve bureaucratic issues like work visas as well as technology barriers like low bandwidth. Cross-jurisdictional collections involve quite enough in the way of legal issues without these extra implications. Where it is not possible to collect across the network, a portable device using a pre-programmed dongle to define the scope of the collection and to ensure consistency across multiple collections is a more than adequate, and low-cost, alternative.

However the collection is made, the lawyers must strike a balance between the expense implicit in over-collection and the risks of under-collection. Dominic Jaar neatly bridged the session’s two topics by wishing for a merger between the selection power of predictive coding and the collection capabilities of remote collection tools. Perhaps we will get there by 2013, but we have first to get acceptance for predictive coding as a defensible technology. This, as Greg Buckles indicates in his article, was the main theme of the predictive coding section of the session.

Greg Buckles recites the common view (not his own, I think) that “Most predictive analytics are essentially black box technologies that ask the user to trust complex selection profiles”. Like the use of the word “magic”, the “black box” label panders to the perception that the software is taking decision-making away from the lawyers and just giving them the results. We cannot just shrug our shoulders and say that we all know that the use of Boolean keywords on their own is a defective search methodology, and stick to it just because the lawyers understand it.

Keven Hayworth referred to various use case scenarios which effectively involved collaboration between human and machine as a step prior to cooporation with opponents. Functions to “find me more like this” and “expand on this” were useful ways of validating keywords and prioritising review (that is, finding most of the relevant documents early on and getting them in front of the right people). Quality control – the answer to the question “have I missed anything?” – is, he said, a free-standing benefit which enhances rather than supplants older and more conventional ways of arriving at the final selection.

Howard Sklar amplified on this. The use of predictive coding means that, as he put it, “the cream rises to the top”, enabling the human conclusion “this is the kind of thing we are looking for”. Different technologies are used for different purposes: keywords are part of the analytics; a concept review helps to give you a starting place; the terms which Greg Buckles refers to as “implied conspiratorial agreements” might include phrases like “thanks for the money” or “go ahead” which keywords alone are unlikely to retrieve.

The human element remains important also for defining the issues – Keven Hayworth observed that the more focused the case issues the narrower the batches returned by the process. There is also, Howard Sklar said, an unintended or unexpected benefit of using this kind of technology – the lawyers get more accurate as we show them ever more accurately-selected documents resulting from iterative predictive coding cycles. There was plenty of judicial authority, he said, supporting the idea that sampling is not just good, but essential, especially as a means of validating document selections which no one has looked at. Predictive coding enhanced that ability and was not just a substitute for it.

Howard Sklar gave some specific examples for uses for predictive coding which had nothing to do with judicial acceptance of the technology. One involved whistle-blowers who are, he said, generally “fired, demoted, transferred or ignored”. They are less likely to take their complaint outside if the process of investigation is both quick and fair, and the use of predictive coding allowed this. It can be used as a compliance monitoring mechanism, run across regular collections of the documents relating to the highest areas of risk such as those relating to countries in the bottom third (say) of the Transparency International Corruption Perceptions Index. The power of predictive coding can be used to take control of a meet and confer predicated on conventional keyword agreements – the ability to say immediately that a particular keyword will, on its own, add x thousand documents to the collection does not only fortify an objection based on proportionality, but shows opponents that you are in command of your own material, a good example of the idea that keywords and predictive coding are not mutually exclusive methods.

This is all compelling stuff, making good use of the new technologies without relying on them exclusively in a situation where others may think of “black boxes” and “magic”. We cannot ignore the view which equates “new” to “untested” and extends that to mean “indefensible”. Estimates of what is proportionate, defensible and appropriate depend in part on general acceptance, and we have a way to go before this kind of technology will be generally accepted. Those who seek to encourage the use of this technology do not have to go so far. The key message from this part of the session was the fairly obvious one which recurs frequently in these pages – you need to know what all these technologies are capable of, what they can achieve, where they can properly be used, what the limitations are and what they cost. What we cannot do, as client, lawyer or judge, is simply stick with the old ways of doing things – whether preservation and collection or document review – as the tide of documents and the resulting cost rises above our heads.


About Chris Dale

I have been an English solicitor since 1980. I run the e-Disclosure Information Project which collects and comments on information about electronic disclosure / eDiscovery and related subjects in the UK, the US, AsiaPac and elsewhere
This entry was posted in Discovery, eDisclosure, eDiscovery, Electronic disclosure, Forensic data collections, Guidance Software, ILTA, Predictive Coding, Recommind. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s