Connie Crosby, writing on Canada’s SLAW Online, has written a valuable pair of articles called Why Can’t You Just Make It Work Like Google? Part 2 of which has the sub-heading (and the answer) Good Enough Is Not Good Enough. Part 1 is here and Part 2 is here.
It is a beguiling question which comes up rather more often than one might hope – I most recently heard it at the end of a very high-end lecture whose audience had been hand-picked for its technical sophistication. Lawyers are nervous of the apparent complexity of the search interfaces offered by litigation support tools; they hear descriptions of concept searching, de-duplication, e-mail threading, clustering and predictive coding and then say plaintively “Why can’t you just make it work like Google?” Those same lawyers, of course, are those who are terrified of failing to give discovery of all the documents which they are required to produce or of handing over privileged documents.
Part of the problem lies in the fact that people do not understand how Google works – none of us does, of course, since Google’s algorithm is as close a secret as Coca-Cola’s recipe, but we can get a clue from looking more closely at the results we are given.
If the primary point is that Google does not purport to give you everything which responds to your keywords (and you would not be grateful if it did) a secondary point lies in things which are fundamental to Google’s model: would you want your eDiscovery search results to be influenced by the number of links which other people have made to a document, even if we were in the habit of cross-linking our e-mail, Word documents etc? Perhaps you really do want only the documents which have their keywords at the top and repeated several times. What if some custodians have paid a fee to have their documents respond first? What about synonyms, typos and aliases?
I can give you an example from my own experience. I spend a sadly disproportionate amount of my life in hotels, and am reasonably picky about where I stay. I almost never stay in hotels which appear in response to Google searches because I quickly find that the same ones keep turning up, however much I refine my search terms. The “winners” may include the best and most appropriate hotels, but that will be a matter of chance because the ones which appear repeatedly are those who know how to play the Google game. Good search engine optimisation will win every time, aided by the insidious effect of sites like TripAdvisor which, through sheer volume and clever SEO, influence your results by flooding the indexes with the views of Dwight from Denver and Ethel from Ealing. Would you want to go to a hotel recommended by the sort of people who post their views on TripAdvisor? Even if you put -tripadvisor in your query (as I do), you are merely reducing the volume of crap cluttering your result; you don’t know how much influence TripAdvisor has had in determining the page rank of other sites.
I have gone back to the guidebooks now for hotel recommendations and then search for them by name. Buried beneath those patronised by Dwight and Ethel are the places I want to stay in. They equate to the documents which you want to find.
Or rather, they don’t – because the whole Google model, excellent though it may be for most purposes, is not appropriate for eDiscovery searches. You are not (or not necessarily) looking for anything in particular, but trying to see what the document collection contains. Sure, you have some starting points in the form of date ranges, custodians and possible keywords, and some things you really don’t want in the same way as I don’t want TripAdvisor, but these will give you no more than a starting-point.
US District Judge Scheindlin put it well recently:
Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.
The first question one asks when approaching eDiscovery / eDisclosure is “What have I got here?”. You would not think of asking Google that question, and its search tools are not made to give you an answer. You need tools more precisely targeted towards the problem. They exist, with a wide range of functions and purposes, including the ability to ignore stuff from irrelevant sources. Instead of wondering why their makers have not made it like Google, go and see what that functionality is and does.