E-Disclosure, Needles and Haystacks 2 – volumes

This is one of a series of articles based on an account by barristers Alex Charlton and Matthew Lavy of a document-heavy case in which they were involved as the recipient of a large electronic Disclosure. Each article is free-standing but collectively they cover the original article’s sub-title “Where it went wrong (and how we can fix it)”.

The opening article is here

The claimant’s standard Disclosure consisted of 30,000 hard-copy documents and 226,000 further documents in an electronic database, amounting in all to 1.6 million pages. It was said by the claimant that the sources from which Disclosure came, including servers and back-up tapes, comprised 8 million documents.

“It was readily apparent …. that the database ….. contained thousands of documents that had nothing to do with the issues in dispute and indeed nothing to do with the project. There were thousands of duplicate documents notwithstanding the fact that electronic de-duplication had apparently taken place. There was no useful structure to the database and no folders of e-mails by individual (something the defendant thought had been agreed). We believed that the defendant was facing a needle-in-haystack situation and applied to the court for relief”

The disclosing party had arrived at their final total by applying a crude date range filter and had then used a set of 333 key words to refine the collection further – that is, any document which included one or more of the chosen key words and which fell within the date range was disclosed. That had “failed to reduce the number of documents to a level that the claimant deemed to be appropriate for disclosure” so they reduced the number of key words to 133. This, plus an attempt at de-duplication, had resulted in the 226,000 disclosed documents.

Pausing there, the principle of narrowing a document set by date range, key words and de-duplication is widely (although not universally) accepted as one way to start reducing volumes to an acceptable level. I do not infer from their article that Charlton and Lavy were arguing with the principle. Three points arise, however, one expressed in the article and two implied. It is said that the claimants’ lawyers made “some checks…for privileged material” but it is implied that no other attempt was made at further refinement. It is also implied that the key words were not agreed at that stage with the receiving party.

One does not need to know more, really, to see that this was wrong. Leaving aside the actual deficiencies (as described in the quoted passage above) and any arguments as to the precise meaning of the rules, the fact that the receiving party was not involved – one might almost say “implicated” – in the selection of key words, opened the disclosing party to allegations as to selectivity which could have been closed off.

This is important not least because key word filtering is a pretty blunt instrument, open to challenge from either end. A respectable school of thought, backed by informed studies, shows that key word filters can eliminate many documents which ought to be retained by any definition of relevance.

Equally, and looking in terms of what is left, there is the suggestion in this case that no further filtering took place beyond a check for privileged documents. I will look in a future article at what the rules require and at the arguments raised about this in the case in question, but you do not need a close study of the requirements to see that there are relatively few cases where key word searches alone will provide a filter which is either safe (in terms of what gets left out) or adequate (in respect of what survives). The possible exception is where the documents came from a project server, that is a repository dedicated to the same matters as are at issue in the case, with a defined vocabulary.

I am slightly uneasy, too, about the idea that the disclosing party’s judgement about the efficacy of its first set of key words was affected (according to Charlton and Lavy) by the fact that the first pass

“…failed to reduce the number of documents to a level that the claimant deemed to be appropriate for disclosure”.

This somehow implies that there is an absolute number of disclosable documents, or perhaps an acceptable quantity which fluctuates in some ratio with the value or importance of the claim. There are certainly judgements to be made about the scope of disclosure relative to the value of the claim, but you do not use these – not unilaterally anyway – to define what is or is not caught by the rules.

I will revert separately to the arguments about the construction of the rules and the debate which took place as to the value of this electronic database. The upshot of the receiving party’s application to the court was that the judge ordered the claimant:

“to apply a narrower agreed set of keyword filters to the database to shrink its size to more manageable proportions and to try to minimise the number of documents not falling within r31.6(a) or (b). This order was predicated on the basis that word searching could in principle provide an imperfect but workable remedy.

The claimant complied with the order. The result was a database of approximately 115,000 documents.”

The receiving party was still not happy:

“After a number of abortive attempts to navigate the material using search tools alone, the defendant ultimately decided to undertake a time-consuming and expensive comprehensive review exercise, gathering a team of people to look at each of the 115,000 documents in turn.”

The case settled and so the question as to the costs of Disclosure was never argued. The only thing we are told about the costs was that:

“the costs paid to the consultants for their document handling exceeded the costs incurred by the solicitors by a factor of about five.”

You will have detected a hint or two throughout this article of my view that one or two aspects of all this might have been handled differently. I am not quite ready though, to leap in and say that the whole thing was a disgrace, letting the side down, sullying the good name of electronic disclosure and so on.

For one thing, I am viewing it all at second hand, through the spectacles of the other side. I somehow doubt we will see their opponents’ riposte in print.

Then there is the starting point – 8 million documents is an awful lot to begin with. It does not surprise me that the document consultants’ fees exceeded the lawyers’ fees by a good margin – I have yet to find myself, alas, in a position where my fees match those of the lawyers, but it would appear that the brunt of reducing the volumes from 8 million down to 115,000 fell on the consultants if (as we are told) the lawyer’s main input was to pull out the privileged ones.

I suspect they did more than that. There were (or seem to have been) bona fide arguments about the scope of the rules and the value of that particular database – although whether it was the software or the method of using it which was to blame is not clear. The opinions of opponents, however objective, are bound to be influenced by the fact of having been opponents and there was much, no doubt, that they were not aware of.

I am not sure either that I will take by default the assessment by one party that another party’s database – software or content or structure – is rubbish (I paraphrase – it is not so described in terms in the article). Had they been trained to use the system? No doubt they were very competent and highly-trained. All I mean is that there is a great deal we do not know about this case which inhibits stern judgement on it.

Nevertheless, it seems clear that, leaving all questions of blame aside, Disclosure was an expensive dog’s breakfast viewed by the standards defined in CPR 1 as the over-riding objective. For all I know, the disclosing party was well pleased with its settlement despite having to pay its document consultants five times what it paid to its lawyers, but they might wish retrospectively for two things to have been different.

One is that pile of 8 million documents. The job I would like (please) is the one of helping to trim that lot down to just the ones which are likely to matter in any foreseeable litigation, for any regulatory purposes, and for the general commercial requirements of the company – not just trimming it (litigation readiness) but making sure it stays that way and is proof against claims of that too much has been deleted (with a document retention policy).

The other retrospective wish might be that the lawyers had started discussing the scope of Disclosure rather earlier than in front of the judge on a contested Disclosure application. The implication from the article is they did not to any great extent. The Practice Direction to CPR Part 31 requires this.

The third article in this series appears here.

My thanks to Mark Dingle of Simmons & Simmons for input into the key words aspects of this article. The original version appeared to accept without reservation the use of a key words list to achieve a primary cull. I do have serious reservations about this, and have qualified my original text to make this clear. More will follow on this point.

About Chris Dale

I have been an English solicitor since 1980. I run the e-Disclosure Information Project which collects and comments on information about electronic disclosure / eDiscovery and related subjects in the UK, the US, AsiaPac and elsewhere
This entry was posted in Court Rules, CPR, Document Retention, eDisclosure, Litigation Support. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s