Craig Ball Entertains at CEIC 2011 on Computer Forensics for Lawyers

I choose my words carefully when I write, and nowhere more than in the headings to articles. It took me 10 seconds to decide that the word “entertains” would form part of the heading to this post. “Entertains”, “Forensics” and “Lawyers” might appear to be mutually exclusive terms. Add the fact that Craig Ball’s session lasted for two and a half hours across lunchtime on a sunny Sunday in Orlando, the entertainments capital of the world, and you would think it remarkable that anyone could hold a large audience. Craig pulled it off.

Craig BallThe venue was CEIC 2011 or the Computer Enterprise and Investigations Conference to give its full name. The title of Craig’s session was Nerdy Things Lawyers Need to Know About Computer Forensics and a Few Nerdy Things Forensics People Need to Know About the Law. I have pages of notes, but I do not intend to summarise the whole thing. A few points will give you the flavour of it.

Many important things are very dull, and the standard recitals of information volumes – how many Gb per typical user and what that converts to in paper for example – is one of them. Here is one to grab your attention: take every word you ever read, every piece of evidence you have seen, and every phone book, cereal box, and road sign; add the text of every conversation in which you have taken part, the lyrics of every song you have ever heard and the script of every movie or television series you have seen. All that would fit on the smallest hard drive you could buy, with room to spare. Other media forms add volume – we are constantly photographed, and financial transactions are tracked; GPS allows our movements to be traced, and all this is in addition to information which we choose to publish about ourselves on FaceBook or whatever.

My own focus is on how we can cut through those volumes as quickly and as cheaply as possible in a way which is consistent both with the rules and with the interests of justice. Craig Ball is interested in that too – he is an attorney and a Special Master – and his doorway to it at this session was forensics. The word “forensic”, let us remember, means of or pertaining to evidence.

It is the lawyer’s job to collect the evidence and to know what to look for in his opponents’ discovery/disclosure as well as in his own. The point of recommending a Craig Ball talk is not to incite you to collect everything, but to make you aware of the places where information may hide and of the possibility of recovering it for those cases where it matters. Let us take some random points from my notes:

You may want to track the thought processes behind a pending or other wrongful activity as well as the activity itself.

Deleting the visible records barely scratches the amount of information left behind – Thumbs.db, for example, does not purge itself when you delete visible copies, and copies of every image are left behind.

A Mac and other Apple products keep even more, including screenshots, keystrokes; every screen view which shrinks away when you move to another one is retained.

New data is written to unused clusters first; “deleted” does not mean “gone”, and some clusters may never be over-written.

The file system parallels a library catalogue; if the record which equates to the library card is gone, the book does not disappear, it just gets harder to find.

There was much more like this; the purpose of recounting it is not to expect lawyers to leap off and collect data themselves, nor even to feel that they have a comprehensive list of places where data may lurk. The point is that it is extremely difficult to conceal one’s trail when not only the files but the use of external devices is recorded in places which are themselves hidden from view. If one knows only that much, it becomes easier to decide which cases may warrant a full investigation.

Craig moved on to challenge some of those assumptions beloved by lawyers about search. They are deluded, he said, by fabulous tools from Westlaw, LexisNexis and Google which are developed to return everything you could want. But case law databases and other legal resources have a finite vocabulary and everything is spelt correctly; it is easy for lawyers to imagine that typing in a few keywords will bring back everything they need. Google users, in general, are not concerned to retrieve every possible record, just enough for their purpose. What better demonstration is there of the vagaries of spelling that Google’s founders intended to call their new product Googol?

The Blair-Maron survey rubbed shoulders with the infinite number of monkeys, and we tumbled from there into the fact that every word in the phrase “To be or not to be” will by default be on the blacklist of many search engines. Concept search and other sophisticated search tools are not a magic black box and latent semantic indexing is not mere trickery. Words have patterns, and assumptions can be made from proximity, recurrence etc. You do not need to understand how these things work to be impressed by the hops from the mathematics of ones and zeros to the technology of search.

Craig moved easily from there to metadata, that splendid word which lawyers use to show that they have really got ediscovery – “Just get me the metadata” they say “I know that it is good, because Craig Ball told me”.

Every e-mail message is information assembled on-the-fly from metadata. The only question is how much of it you need for any particular case. Judge Scheindlin had identified the basic fields to be collected in the NDLON case. You might need more in a particular case, but it is helpful to understand just a little before pitching in. Costs are wasted through bad techniques, with vast amounts lost through ignorance – “bad practice has become an industry standard”, Craig said. Much of the volume can be deleted safely simply by knowing what files are and “there is a lot of stuff you can get rid of without disturbing anyone”. “It is not the case that every file contains private information, whatever lawyers like to think” he added.

Craig explained the use of hash-matching to make sure that no file or message needs to be looked at more than once and illustrated how, whilst every e-mail is different, there are sufficient common elements in e.g. a message sent to multiple recipients sufficient to flag it as unique.

Much expense is wasted in arguments over the form of production, that is, the nature of the file sent to the other side. Knowing the rules, understanding a little technology, having an idea of relative costs, being clear about what you really need and engaging in discussion about it are minimum requirements in this context, as in others.

An opponent who says “we will give you the data on DVD” is not specifying the “form of production” but the medium on which production will be given. People object to handing over native files on various grounds: native files may give away tracked changes (but what authorises you to remove or conceal this component of a file?); they may argue for exchanging images because you can put a Bates number on the pages (but why do you think these are required or even helpful when native files have so many other means of unique identification?); they say that you cannot redact a native file (but how much redaction is really needed and should the redaction tail wag the ediscovery dog?)

Native format, the form in which information is on its home drives and devices, is the lowest-cost way to collect and deliver data. Excel, Word, Outlook and PowerPoint files are most useful and most searchable in their native forms. Converting them to images and adding a load file removes utility and adds volume as well as more expense at every stage – conversion, moving, storing and reviewing. Converting documents and then adding identifying tags to them is like turning wine into water and back again, exceeded in stupidity only by printing documents to paper and then scanning them back into the system.

Craig Ball 2Craig ended by turning his guns on those who say “we do not have the tools and skills” – well, he said, we have to grow with our jobs and have to learn to work with the evidence as we find it. The expense caused by ignorance becomes an access to justice issue. Craig illustrated this final point with a photograph of slavering wolves. Those lawyers who do not have the basic skills are at the mercy of not only their opponents and the court but of those who would sell them purported solutions. There are many good solutions out there, but the lawyer needs to be able to evaluate them.

This was the most entertaining eDiscovery session I have ever attended and every word articulated something which lawyers need to know. What is more, apart from the occasional FRCP-specific terms, this talk will mean as much in my own jurisdiction and elsewhere as it does in the US. I came away determined to find a forum in which we can get Craig to entertain our own lawyers in such an informative way.

About Chris Dale

I have been an English solicitor since 1980. I run the e-Disclosure Information Project which collects and comments on information about electronic disclosure / eDiscovery and related subjects in the UK, the US, AsiaPac and elsewhere
This entry was posted in CEIC, Discovery, eDisclosure, eDiscovery, Electronic disclosure, Forensic data collections, Litigation, Predictive Coding. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s