A recent article by US ediscovery expert Tom O’Connor discusses the ever-green subject of ediscovery certification. One of the points he made was to do with understanding basic concepts before we get too ambitious in describing (still less certifying) proficiency in wider aspects of ediscovery skills. He gave as examples questions like “what is a tiff” and “what is a native file?”
The next tier up from simple descriptive terms like this are the technical terms which the experts bandy about between themselves as if they were common currency. A good example of this can be found in the various types of search technology which have been developed to handle large volumes of documents. I have a list of them on one of my slides and do my best, along with many other subjects which I cover in a rapid-fire one-hour talk, to give the audience the briefest summary possible of what “predictive coding”, “e-mail threading” and “clustering” mean.
All these technologies, and others, serve different purposes to the same end. They vary in sophistication (although, of course, an apparently simple function in user terms may have an extremely clever algorithm below it). On the face of it, “concept search” is easier to describe and to understand than some other technologies. After all, we have had Roget’s Thesaurus since 1805, so the idea of semantically-linked words is not new.
Clearwell has produced a white paper called The Next Generation of Concept Searching to back their Transparent Concept Search functionality. It describes in straightforward terms why simple keyword searching is an inadequate way of finding relevant documents, using the multiple meanings of the word “strike” as its prime example. Very large sums of money, and not a little risk, turn on doing the best job one can of finding documents required in litigation and analogous proceedings, and I commend this paper as a straightforward guide to what concept searching is and why it helps in 21st century document search.