Structured data is neither as easy nor as difficult as it sounds

Lawyers tend to overlook structured data. If they think of it at all when giving disclosure, it goes into the box marked “too difficult to deal with”. A decision that it is disproportionate to handle it may be right, but “decision” implies that its value has been weighed against cost, which is not the same as just ignoring it. I asked Jim Vint at FTI Technology to give me some examples where structured data was crucial to a case.

In general, lawyers like structure, with its implication of order and of things being in the right place. I do not necessarily mean that they (or “we”, strictly, since I am a lawyer too) prefer that every day is the same from alarm clock to Ovaltine (that is what the civil service is for as a career), but lack of organisation wastes time, and time is money. If you need a library book, your favourite coffee, or a particular iTunes track, then it is helpful to have some degree of pattern and consistency to help you find those of a like kind in a regular place. You expect a library to group its stock by subject and type, and not have law reports, textbooks and periodicals stuffed any old how into random shelves or all over the floor; imagine going into Starbucks and being told that every possible permutation of coffee, chocolate and the rest is in a cup somewhere, but that you must lift each lid to see which is which; you would not appreciate having to scroll down endless lists of iTunes tracks until you find the one you want. We go for the structured stuff every time.

Paradoxically perhaps, the opposite is true when we are dealing with disclosure data. The lawyers can get their minds round the idea of going through thousands of unstructured files – Word documents, spreadsheets, mail messages and the like – but do not want to tangle with the structured data sitting in well-organised databases with names like Oracle, SEP and PeopleSoft. Sometimes, they genuinely do not realise it exists – the nature of some of these databases is that they purr along in the background, quietly holding the business together but invisible to the users; sometimes, their sheer size and the fact that they tend to consist of numbers rather than words makes users deliberately blind to them; quite often, their contents really are irrelevant to the matters in issue in the litigation or investigation. A decision that it is disproportionate to tangle with them is not, however, the same as the unspoken conclusion that they are too difficult to deal with. Limiting them on proportionality grounds implies that the value of their data has been weighed against the costs of extraction and a conclusion reached. That is rather different from oversight or closing one’s mind to their existence.

So what is structured data and in what kind of cases might one expected to have to deal with it? The subject had been on my agenda for my visit to FTI Technology (see There is more to FTI Technology than Attenex and Ringtail) but we ran out of time, so I went back to see Jim Vint at FTI to find out how these often large financial, transactional and operational databases turn up in litigation or regulatory investigations.

I am a words man myself, like most lawyers, and can get my mind round the idea that you can make sense, and make sub-sets, of large volumes of words using simple keywords or the sophisticated technology of concept searching, clustering, duplicates and near-duplicates, e-mail threads and all the other ways in which one can refine collections of words, however vast – we do it every day to some extent with Google. I do not have much difficulty, either, in recognising that cases involving pure financial matters must necessarily involve the analysis of millions of rows of numbers – one of the interesting facts in the Valukas report on the fall of Lehman Brothers is that there were more than 2,500 software systems and applications of one kind or another in use at Lehmans, many of them full of very large numbers. If that is what your case is about, then you will not overlook the significance of the databases, you will probably be in a very large law firm, and you will inevitably call on FTI Technology or somebody else of its standing to help you with it. There are many other cases, however, where the need to turn very large amounts of structured data into information is less obvious, particularly if you do not even apply your mind to what might be there. The third party skills of those used to dealing with the very large matters have application in smaller cases as well, and that may in fact be the only way of solving the problem.

Let us start with some basics derived from the UK court rules. By Rule 31.4 CPR, a document is “anything in which information of any description is recorded”. That sweeps up a column or row in a database, the database itself and, potentially, the computer on which the database resides. By Rule 31.6 CPR standard disclosure requires the disclosure of any document on which you rely and which is supportive or adverse to your case or the case of any other party. Rule 31.7 CPR and paragraph 2A.4 the Practice Direction to Part 31 CPR set parameters round the scope of a reasonable search. Different jurisdictions have different rules, the principles applicable in arbitrations and other forms of dispute resolution are not precisely the same, and regulatory or internal investigations have their own rules, but the broad principles defining a duty to disclose information are broadly the same everywhere.

Before you say “my clients don’t have structured databases”, you may care to consider two things. One is that your own lack of knowledge does not mean that your client does not have them – we are, after all, still finding government departments apparently denying that they have electronic documents at all, so ignorance does not equate to non-existence. The second point is that actual cases are not the only reason for extending your knowledge – the ability to demonstrate some familiarity with at least the concepts may actually help you to win new clients, if only on the basis that the one-eyed man is king in the land of the blind. No one is suggesting that you need to know how the damn things work, but it is helpful to have a broad idea of the sort of data which might be found in your clients’ systems and what use you might make of it in litigation, when facing a regulatory investigation, or when getting involved in an internal enquiry of some kind. It is also worth observing that you are not solely concerned with the data which your clients have; one day, you may need to challenge an opponent – if he fails to disclose databases and you do not know that you ought to expect them, the litigation may be cheaper to run, but you might just lose the case for want of something which you ought to know about.

It is worth observing in addition, that an example does not have to match your own circumstances exactly for it to be relevant. The first example which Jim Vint gave me, for example, concerned an investigation by the European Commission into alleged bribery and corruption relating to the method by which contracts were won. That does not narrow the relevance of the example to EU-related issues, to regulatory investigations, or to corruption, let alone to a combination of all three of them.

The investigation in this case began by looking at traditional e-Disclosure targets such as e-mail and user files on the server drives. Even that relatively limited target for 15 custodians over the specified time period produced over 500 GB in volume, conservatively estimated to comprise 3.75 million documents. This was reduced to 500,000 documents by standard keyword and date range filtering. The nature of the allegations, however, implied a need to look at the transactions in the clients ERP (Enterprise Resource Planning) database, that is, the large centralised database holding information across the range of the company’s activities.

The point here was not to use the structured data as a source of further documents (although that might have been one result) but to use its information as an additional culling tool. 4-5 million transactions were analysed to identify the business periods prior to the “payments” which were the subject of the investigation. Using information obtained in this way allowed the primary document population to be reduced to 25,000 documents, which were reviewed by the lawyers in under a week. A by-product of this approach was that further suspicious payments were identified. There was clearly a cost involved in undertaking this exercise, but that was much smaller than the overall cost savings to the client as a result of the reduction of the review population.

Jim Vint’s second example similarly made use of third-party data to supplement information in the hands of the parties. The case involved a claim that invested assets had been left to stagnate in a declining vehicle whilst the investing company was left in the dark as to its investment strategy and its options. Seven time periods were at issue over which the investor argued that changes should have been made to avert losses of about £2.8 million.

The requirement here was to compare the actual history of the investments (supplemented with relevant e-mail correspondence) with external data to allow comparison with alternative approaches to the investment. Market trends, external macro- and micro-economic factors and alternative investment vehicles provided comparative data showing whether transfer of the funds into alternative vehicles would have resulted in better, worse, or the same outcomes as that achieved by the investment. This aggregation of data also allowed, for example, the position at any one time to the considered in the light of the correspondence, including advice received by the financial institution and not acted upon, and the absence of communication with the client. The analysis led to an early settlement.

Whilst this was clearly not a trivial case, it was also not a very large one – many firms conduct litigation to this value. As always, it is necessary to look at the costs of such an exercise as part of a formula which includes various heads of risk including, perhaps, the possibility that there was no other way of proving that there had been a loss, still less what the loss amounted to.

It is also right to observe that, whilst the technology needed to pull in the data and undertake the analysis was clearly sophisticated, understanding its purpose and value required no more than the wit to spot that an expert was needed and the nous to find one. It is not good enough simply to assume that the use of technology is too expensive without comparing its cost with alternative ways of achieving the clients’ objective. In many cases, there is no alternative. An approach which ignores structured data, whether in your clients’ hands, in the hands of the other side, or in third-party databases, risks a public humiliation of the kind handed down in more than one UK judgment recently. If that risk is not persuasive enough, the possibility of actually winning clients through your apparent familiarity with the subject may add an incentive.

The ESI Questionnaire annexed to Master Whitaker’s judgment in Goodale & Ors v The Ministry of Justice & Ors [2009] EWHC B41 (QB) (05 November 2009) includes a space to say that your client has databases of this kind. The actual question is “identify database systems, including document management systems, which may contain data which may be disclosable and which were used by you during the date range”, and the table for your answer includes a box in which to state how you propose to give access to the other parties. The question implies, as is sometimes the case, that the contents of databases of the kind discussed here are not necessarily easy to exchange. Such factors may go to questions of proportionality when the scope of disclosure is discussed. Disclosure of the fact that such databases exist, however, is not optional.

Home

About Chris Dale

I have been an English solicitor since 1980. I run the e-Disclosure Information Project which collects and comments on information about electronic disclosure / eDiscovery and related subjects in the UK, the US, AsiaPac and elsewhere
This entry was posted in Attenex, Discovery, eDisclosure, eDiscovery, Electronic disclosure, FTI Technology, Litigation Support, Part 31 CPR, Regulatory investigation, RingTail, Structured data. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s