Predictive Coding Wars: Recommind Contra Mundum

It is a novel experience to spend a whole Saturday writing a 4,330-word article whose conclusion is that none of its subject-matter is really very important to one’s readers, however much it means to the participants in the story.

If you have come to see me take sides in the predictive coding war of the last few days, you will be disappointed. My job is enlightenment: picture me, if you like, as a small boat sailing between the double line at Trafalgar as the shots fly overhead, trying to give an update on the state of the technology being used rather than a partisan account of the battle. Actually, it has been more like Sink the Bismarck, with enemy ships and planes great and small all directing their fire at one target. Fortunately for Recommind, playing the Bismarck in this scenario, we don’t get to see the final reel.

For those who do not know, I am funded on a flat-rate basis by sponsorship from the companies whose logos appear on the right. Anyone who expects me to take sides misunderstands the nature of my role. It is not just a matter of not biting the hand that feeds me, nor of holding the ring between them when they start fighting each other. The aim is to try and shine a steady light in the darkness for the benefit of those who must get on with the job of managing electronic discovery / disclosure, and to keep it burning whatever is going on around me. I do not actually think that the market gives two hoots for this battle or its outcome (if there is one), but it may be helpful to have a distillation of the debate, if that is not too dignified a term for it.

While we are on disclosure of interests, I should say that I know nearly all the people mentioned here apart from Henry V, Hamlet, Alice and Humpty Dumpty, Houdini, Pontius Pilate, Tom, Dick and Harry, Lt Farley (late of the Confederate Army) and a couple of the referenced authors.

Some scenes, the relevance of which will appear as I go along:

Scene 1: I am in an hotel out east somewhere – Hong Kong or Singapore, I cannot remember. A senior figure in the e-Disclosure market finds me banging my head against the wall, metaphorically if not necessarily literally. I cannot remember the cause. Was it the time that Company A had said that Company B’s claimed profit figures did not bear close examination? Or had Company B just said that Company C’s software was five years behind the times? Or had Companies D, E and F just been crowing about Company B’s recent embarrassment in the courts? Had Company F just claimed that Company E’s pricing model was deceptive? Perhaps it was the allegation that company X was really a front for gun-running and white slaving? Or the hint that Company Z’s software development meetings were attended by half-naked under-aged girls offering lines of white powder?

I may have made up one or two of these, but you get the picture. This is a competitive market whose senior people tend to be intelligent, literate people, as opposed to the intelligent but purely technical people whom you find in some other software industries. Many of them are lawyers, and they fight each other with words, and do it quite well if you think that sort of thing sells software, which I do not. My companion’s sensible comment in that Far East bar was that dealing with this sort of in-fighting went with the territory which I had staked out for myself.

Scene 2: It is 4:30am on 12 April 1861. Lieutenant Henry Farley fires a 10-inch mortar round at Fort Sumter. The long-awaited Civil War had begun; it had been obvious for some time where the locus belli was to take place, and only the date and time had been uncertain.

Scene 3: On 5 August 1926, Harry Houdini was lowered into a New York swimming pool in a sealed casket in front of a crowd which was wondering how long he could stay out of sight and how he would get out. It was 90 minutes before he emerged.

Scene 4: Hamlet meets an army off to do battle. He assumes that the target is something important:

Goes it against the main of Poland, sir,
Or for some frontier?

…but the captain says:

Truly to speak, and with no addition,
We go to gain a little patch of ground
That hath in it no profit but the name.
To pay five ducats, five, I would not farm it;

Of which Hamlet says:

… I see  The imminent death of twenty thousand men
That for a fantasy and trick of fame
Go to their graves like beds, fight for a plot
Whereon the numbers cannot try the cause,
Which is not tomb enough and continent
To hide the slain.

Scene 5: Chapter 6 of Through the Looking Glass by Lewis Carroll. Alice is talking to Humpty Dumpty, whose last sentence began “There’s glory for you”

‘I don’t know what you mean by “glory”,’ Alice said.

Humpty Dumpty smiled contemptuously. ‘Of course you don’t — till I tell you. I meant “there’s a nice knock-down argument for you!”‘

‘But “glory” doesn’t mean “a nice knock-down argument”,’ Alice objected.

‘When I use a word,’ Humpty Dumpty said, in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.’

‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’

‘The question is,’ said Humpty Dumpty, ‘which is to be master — that’s all.’

Well, you may say, if your job is illumination, this seems a pretty convoluted way to go about it. Let me explain.

Predictive Coding

The litigation technology gathering most attention at the moment is known as “predictive coding”. The equivalent two years ago was “early case assessment”. Neither of the expressions is exactly “a little patch of ground” or “a fantasy and trick of fame”, as Hamlet put it, but it is unclear that either was ever worth shedding blood over. A loose explanation of predictive coding is that a senior lawyer or subject-matter expert makes relevance decisions about a sub-set of documents; those decisions are then used by the software to make parallel decisions across the full or a much larger document set. It is common ground between the disputants that the algorithms which achieve this are of more than one type, quite apart from the detail of execution. The mechanics of document selection, of achieving user input, and of delivering the results, vary from company to company.

Users are more concerned about the accuracy and judicial acceptability of search tools; they care about ease of use; they expect different things in terms of the output. They do not, on the whole, care very much about how the legs work, still less who owns them, but this does not diminish the significance of the intellectual property rights which surround the technology.

Claims to IP rights in Predictive Coding

For the most part, software suppliers who offer variants on this approach claim no proprietary interest in the name and have been content to let it become a generic and descriptive term. That does not, of course, mean that any one of them concedes that any other has a better right to use the label, and each of them, one assumes, has taken all necessary steps to protect whatever IP rights they felt they had. It would be fair to say that Recommind was more assertive than others in this respect – I recall, for example, an article drawing a distinction between “Predictive Coding” and “Predictive Coding™”. I do not recall that anyone made too much of it at the time, and the ™ bit has now disappeared. Nevertheless, as the term became more widespread across the industry, used both by those who purported to offer predictive coding and by those who sought to explain different classes of technology, it was clear that the temperature was rising.

A swan chases another off his patch (Photo Chris Dale 2011)

I stepped off a plane on 8 June to find a Recommind press release headed Recommind Patents Predictive Coding and I looked around for a wall to bang my head against. That instinct was not itself a criticism of Recommind, merely a recognition that the already-packed days before I was due to get on to the next plane had acquired an extra burden. I set a Twitter search for “Recommind” so that relevant tweets would appear as they were sent, and awaited the inevitable storm.

Suppliers wars and me

It is perhaps helpful at this point both to generalise and to personalise my own role in these supplier wars. For my general view, look at what I said about the last one:

I have seen enough of these stories to know that they have a reasonably predictable arc. For one thing, they never emerge in relation to unsuccessful companies; what gives them legs is the fear of competition, and no one bothers to attack the weak. … It is all part of the knock-about of competition, of course, and most of the players give as good as they get. Occasionally, the challenge of doing an elegant stiletto job on a rival can result in new ways of describing the benefits of one’s own product. The comments never seem to make the slightest difference to the market share of any of the players, however, inducing merely a “plague on all your houses” reaction from a generally mature audience whose buying decisions are made on much deeper grounds.

As to my personal involvement, I know and like both Recommind and its General Counsel, Craig Carpenter, who is the company’s spokesman in these things – we fetch up together at conferences around the world, have done panels together, and keep in touch via Twitter and e-mail. What I hear of Recommind’s products is uniformly positive, and their numbers keep going up. Wearing my marketing hat (as opposed to my rules and technology one), I frequently use Recommind as a rare example of an ediscovery company which makes good use of all forms of media to promote itself. I am very pleased to carry Recommind’s logo on my blog as a sponsor of the eDisclosure Information Project.

Amongst the other well-known proponents of predictive coding is Equivio. It also is a sponsor of the Project; I am proud that my photograph and a supportive (and genuinely-meant) endorsement from me (“It’s not often we see technology capable of revolutionizing an industry”) appears on its home page; their list of resources includes a white paper by me called Predictive Coding in UK Civil Litigation which examines acceptance of this kind of technology in a UK context. I have long described Equivio’s website as a model of succinct clarity, and have not been able to keep up with the stream of announcements of new adoptions of Equivio products around the world.

Two very successful companies, then, to both of whom I am pleased to owe a duty. The eleven participants in an eDiscovery Institute survey of predictive coding (see Crash or Soar? Will the legal community accept “predictive coding” by Anne Kershaw and Joe Howie also included another of my sponsors, FTI Technology.

So if I wanted to head-butt a wall on seeing Recommind’s patent claim, it was not just because I see these spats as irrelevant to (and potentially a distraction for) those who might buy any of these applications. It is also because I anticipated that people would come to watch how I would get out of this one, in much the same spirit as the crowds turned out to watch Houdini get out of his underwater casket. This is not because I carry any more weight than the many other commentators out there, but because my particular relationship with all these providers makes people interested to see how I navigate the occasional fights between them. A number of people sent me links to make sure that I had the story and, as the tweets of hostile articles flew past, I sent out my standard reminder that a re-tweet, from me at any rate, does not convey approval or agreement, merely the feeling that people will be interested.

The trajectory of the debate was reasonably predictable. There would be some fairly hostile journalistic comment of the kind which, in old-fashioned parlance, “sells newspapers”; respected keepers of the flame of purity would object to the hijacking, as they would see it, of a phrase which had just started to become useful as a generic description; some more balanced comment would emerge and everyone would wait to see what Equivio would have to say; when the shot and shell had passed, Recommind would come back with a riposte. With any luck, the dust would then settle and we could all get on with trying to improve both the discovery processes and the market’s understanding of what this technology actually does for them. It is too early to say whether the dust has settled, but we have certainly seen some interesting comment, not all of it either contentious or valueless. Not all of it.

The war correspondents

The advantage of standing back from it is that one can point to the articles which have some lasting value. The selection which I give below is in no particular order of importance, just taken as they come from the collection which I made as the comment flowed by. If that is largely hostile to Recommind, well, they cannot have expected anything else, and their back is broad.

Let’s start with the press release which kicked it all off. You may know of the 18th century bishop who went through the Apostles’ Creed, striking out everything which he did not believe until he was left only with “I believe in Pontius Pilate”. I adopt the same approach with press releases, though my test is not whether I believe them but whether any paragraph carries us much further forward. If you remove from Recommind’s PR anything which could have been said by anybody about their predictive coding, and ignore any sentence which has the word “only”, “leadership”, or anything else indicative of claimed uniqueness (I am not denying it, just ignoring it; God, this having to qualify everything is tiresome), you are left effectively with the opening paragraph, the equivalent of Lt. Farley lighting the blue touch-paper on his 10-inch mortar opposite Fort Sumter:

Recommind …. today announced that the US Patent Office has issued the company Patent No. 7,933,859, covering systems and methods for iterative computer-assisted document analysis and review. This patent gives Recommind, its customers and its partners exclusive rights to use, host and sell systems and processes for iterative, computer-expedited document review.

It is that last sentence which causes the problem, isn’t it, apparently not distinguishing between Recommind’s own combination of algorithms, processes and workflows on the one hand and broader notions of computer assisted review on the other? Law Technology News rushed out articles with headings such as Gasoline on the fire and Is Recommind blowing smoke? Editor-In-Chief Monica Bay said “Ah, press releases! At least he didn’t call it “revolutionary”. This is one of the many things Monica and I share a view on: if a development is “revolutionary”, it needs a guillotine; I’ll pull the string and Monica can be the tricoteuse, knitting in the front row as the PR heads drop into the basket.

An article by Barry Murphy in the eDiscovery Journal of 9 June and headed Dawn of the Predictive Coding Wars? took much the same line as I had taken in the passage quoted above about an earlier dispute, foreseeing the possibility…

…that the battle over this particular patent becomes much ado about nothing. A similar situation played out about a year ago when many vendors got involved in the scale and performance wars while prospects ultimately either didn’t care about benchmark testing numbers or didn’t believe the testing numbers. In my experience, prospects rarely do a deep-dive patent analysis when looking at software solutions. Patents are nice and help protect IP, but are not always the difference between winning and losing a deal.

Barry Murphy’s conclusion that “this is good news for the ediscovery market as a whole” may well in fact be the outcome, with some clarity emerging as to the precise meaning of the term “predictive coding”; at the moment, like Humpty Dumpty’s use of the word “glory”, it means whatever the user wants it to mean. It seems unlikely that Recommind will sue everyone who offers “iterative, computer-expedited document review”, but the furore caused by its patent may inhibit the unwarranted use of the term by every Tom, Dick and Harry with a vaguely “intelligent” search engine. It is understandable that the competition does not see it quite like that at the moment.

Much of the comment has come from Catalyst, whose CEO John Tredennick came up with the best headline Predictive Coding: One Grumpy Old Competitor Speaks Up, bringing a touch of humour to an otherwise fairly bad-tempered set of exchanges. He is big enough to say “I wish the Recommind people well with their patent and their business – they are doing a lot of exciting things in the industry and deserve their success”, whilst emphasising that his company has its own technology; you can be sure, however, that as soon as we start talking of non-negative matrix factorization that it is not the end-user we are addressing. John Tredennick linked to an article by Dr. Jeremy Bacon called The Recommind Patent and the Need to Better Define ‘Predictive Coding which takes a deep look at the words of various technology descriptions, saying:

So what is the difference? I don’t just ask this rhetorically. I see a very strong similarity in the overall workflows between both predictive coding and relevance feedback, so I would honestly and transparently like to understand where the crucial differences are. If we are to understand what Recommind believes predictive coding to be – and if this understanding is going to help the courts set the legal precedent for defensible use of these technologies, a goal in which I fully agree with Recommind – then we really need to understand the process as a whole and what makes it unique.

The usually mild and deeply authoritative Herb Roitblat of OrcaTec was rather more forthright in his article, beginning with the headline Competitor’s press release about predictive coding patent stretches the truth. Herb is very much a “keeper of the flame” (to repeat the term I used above), combining OrcaTec’s commercial interests with an unchallenged role as a thought-leader in this area – look at the footnote “About the author” to get the flavour of it, or sit next to him at dinner, as I was fortunate enough to do recently. I won’t give you selective quotations from it – to go back to my Trafalgar analogy, this is like getting a broadside from a ship of the line. You need to read it (it is very short and carefully structured) to get its full force – but do remember my over-riding qualification that my referring you to something does not necessarily equate to affirming everything in it; I am treading on eggshells here and making it clear that I am not in a position to agree or dispute matters of patent registration and the like. Almost by the way, Herb shows us the origin of the word “predict” in “predictive coding”.

Equally learned in its approach was an article headed Patents and Innovation in Electronic Discovery by Venkat Rangan, CTO of Clearwell, whose recital of the history of this technology ends with a plea for sharing of ideas for the greater good of the market and of justice.

Then came the first article which concentrated solely on what is really the only thing which matters here – getting lawyers to accept technology of this kind, whatever it is called and whatever its scientific basis and ownership. My eye fell on the sentence “it’s a tough sell getting law firms to automate attorney review. It’s not easy getting the legal sector to accept the defensibility of it without more precedent and judicial guidance”. I came across the article on a site which not only hides its ownership but, in this case at least, did not credit the author, providing only an unlinked URL at the bottom of the page. I was interested to see who had written the article, headed Predictive Coding Patent War: Foxhole Religion for E-discovery?, and it came as no particular surprise to find that it was Katey Wood of Enterprise Strategy Group, who always brings a nice turn of phrase to technology news. The key paragraph is this one about credibility in the legal market:

Yes, machine-learning and mathematical algorithms for information retrieval have been around and used in other fields reliably for decades. Yes, automating review cuts data volumes and attorney review times dramatically when done right. Yes, it’s been shown to be no more flawed (and typically more consistent) than human review. Yes, Recommind’s results have blazed an admirable trail for it, as well as racking up a number of impressive customer and project wins – as have many of its competitors.

Herb Roitblat has added his own helpful comment at the bottom. The debate seemed to be moving away from the mud-slinging and into a more constructive phase. Like Houdini in his submerged casket, I stayed out of sight, wanting to see Equivio’s inevitable response and the equally inevitable rejoinder from Recommind.

Equivio’s comment, when it came, focused on the potential for concern in the market, and ended with this trenchant paragraph (whose un-named author is instantly recognisable to me):

We welcome competition with the other predictive coding vendors. Competition drives the market forward. Competition creates choice. Competition is good for customers and good for the industry. By way of contrast, Recommind’s hollow scare tactics and disingenuous claims to exclusivity seek to reduce and even eradicate competition. We say – let’s compete on merit. Let the products do the talking, and may the best product win.

Echoes there of Henry V before Harfleur (Act III Scene I, if this means nothing to you). Bring it on, as Shakespeare would say today.

Recommind’s Rejoinder

The reply to all this came in the form of a blog post by Recommind CEO Bob Tennant on 16 June headed Of Predictive Coding and Patents. Some of the responses, Bob Tennant said, have been “based on false premises and have served to obfuscate both the facts and our motivations.”. You get the flavour of it from this paragraph:

Some commentators claimed our patent might be difficult to defend due to the existence of prior art. In one blog post, ironically, the author cited a technique called LSI and work by Dr. Thorsten Joachims around support vector machines (SVMs). The author was apparently unaware that Recommind invented and holds another patent on PLSA and its uses—an algorithm that was invented to overcome some of the serious limitations of LSI—and that Dr. Joachims was an early advisor and technical contributor to Recommind.

By its reference to the work of others (notably H5, whose home page, as I write, also carries a link to a paper by me called How can we do this differently?), the post makes it clear that there is more than one set of “systems and methods for predictive coding”. The lay reader might have found it helpful to be told that PLSA is Probabilistic Latent Semantic Analysis but then again, perhaps not; perhaps this whole argument is nothing to do with lay readers (that is, the potential buyers and users) at all, but is taking place way above their heads (or, to introduce yet another military analogy, well below their feet like the mining and counter-mining of trench warfare, with the occasional eruption to show what is happening).

The key sentence of Bob Tennant’s article comes towards the end:

These rights protect our methods for improving the discovery process, not those of others (the emphasis is mine)

I have not looked exhaustively for reactions to this post but, judging by my now silent Twitter search for “Recommind”, only three commentators have passed it on – me, Charles Christian of the Orange Rag and Bob Ambrogli on the Catalyst site with his article The Recommind Patent: Reactions Roll In From Across the Industry.

We may see more next week, when I will be otherwise engaged in Hong Kong. Perhaps, God willing, the fuss will die down and everyone can get back to enhancing their respective applications and marketing them to an audience which really doesn’t give a toss about anything except getting the job done. As Equivio said, “Let the products do the talking, and may the best product win”.

Quietly getting on with the explanations

As for me, well, I guess I could have found a better week in which to moderate a Virtual LegalTech panel called Advanced Technologies for Litigation Support Professionals – but then again, perhaps not. Maybe it was a very good week to carry on the work of trying patiently to explain what these technologies are, what they can do, and how they can help. I and my co-panelists, Bill Belt and Daryl Shetterly from LeClairRyan, identified twelve different categories of software, each with its own place in the armoury of those who deal with ediscovery. One of them, necessarily, was predictive coding.

Intellectual property battles are inevitable when there is so much at stake. This one has so far so far generated more heat than light, but the more I read my way into it, the more I became convinced that it was not for or about the users at all.

After a minute Humpty Dumpty began again…Impenetrability! That’s what I say!’

‘Would you tell me please,’ said Alice, ‘what that means?’

‘Now you talk like a reasonable child,’ said Humpty Dumpty, looking very much pleased. ‘I meant by “impenetrability” that we’ve had enough of that subject….’

Home