Jim Shook of EMC takes us back to the stage before discovery. The advanced technology used for dealing reactively with discovery requests has its place at a much earlier stage in the process.
Judge Peck’s opinion in Da silva Moore passes into a kind of limbo pending its review by Federal Judge Carter. The analysis of the present position has been exhaustive and, to some extent, repetitive, and those of us who comment on these things have little more to say until Judge Carter does his stuff. We are waiting, too, for the next step in the Kleen Products case before Judge Nolan. It is a bit like one of those uneasy patches on the French battlefields of the Great War as everyone waited for the whistle signalling the next big push.
It is a good opportunity, perhaps, to look in a more rounded way at the broad class of technology which, whether you call it predictive coding, technology-assisted review, machine learning, or whatever, connotes generally the idea that computers learn from a mixture of rules and previous inputs in order to “predict” what should be done with documents, classes of documents or, perhaps, whole servers full of documents. The technology being developed for this, and for similar functions which have nothing to do with discovery, has many of the same characteristics and objectives as the pure discovery applications. Marketing intelligence, news sites which point you to related articles, shopping sites which suggest alternative purchases and (as Judge Peck noted) anti-virus software, all include elements of this kind of prediction.
Jim Shook – James D Shook, Esq, to give him his full title – is an eDiscovery expert at EMC. EMC was one of the first information companies give effect to the now obvious conclusion that discovery is can be seen as an end-use for the data which companies keep in ever-growing volumes anyway. Its roots are in archiving and storage, and the intelligent enterprise content management (ECM) found in its Documentum and related products. It acquired Kazeon in order to give its ECM customers a seamless run from document creation through to eDiscovery, the latter coming from its EMC SourceOne eDiscovery Family. Its website page EMC SourceOne File Intelligence has the subheading Informational Growth, Storage Requirements, and Organisation Risk and sits in a menu which extends backwards into the data centre and forward into legal hold. It is this breadth which gives Jim Shook the conclusion to his article Machine Learning for Document Review: the Numbers Don’t Lie
Most of that article is an analysis of the use of predictive coding for discovery in cases like Da Silva Moore. Like many of these articles, it refers to the paper Technology-Assisted Review in eDiscovery can be More Effective and More Efficient than Exhaustive Manual Review by Maura Grossman and Gordon Cormack. Helpfully, it gives page references in that paper for the particular points which Jim Shook wants to make.
It is the last paragraph of Jim’s article which I want to focus on, with its suggestion that “predictive coding technologies show promise outside of the litigation process to help with our information management overload issues”. The technology already exists, Jim says, to apply automatic classification to information as it is received or created, and “improvements, higher comfort level and better understanding of the technologies caused by their use in litigation will help with the adoption rate”.
This is a tangible illustration of what we mean by “information governance” or, at least, of a subset of that expression. in addition to efficiency gains and reduced storage costs, it implies that the information we keep can be limited right from the beginning to the that which we are likely to need, whether for business purposes or for eDiscovery.
Of the 3.2 million documents which are the Da Silva Moore starting point, only a fraction will prove of value to either party in this litigation, and most will never have served any useful purpose since shortly after their creation. The simple maths which Jim Shook sets out in his article – of documents for review multiplied by the review time per document – are clear enough and require (rather than merely justify) the use of all available tools to reduce the review load.
The cost incurred defensively for a one-off purpose which has nothing to do with the company’s business must be taken into account when considering an investment in the sort of technology – intelligent pre-emptive technology – which Jim Shook refers to in his last paragraph.