eDiscovery software provider ZyLAB hs been busy, announcing new Visual Classification tools, Global Support Package for language handling and a white paper on technology-assisted review. Not the least of ZYLAB’s recent news has been the report of a recent very large deployment of ZyLAB’s eDiscovery and Production System.
Searches made for eDiscovery/eDisclosure purposes rely heavily on words. Most of the documents are found by technology of varying degrees of sophistication, from simple keywords through to the algorithms which underlie predictive coding. Cases and investigations involving financial matters may have a high volume of numbers which are not susceptible to linguistic or semantic searches and which raise issues of their own.
Pictures bring yet further complications. Forensic technology has developed its own techniques for identifying pictures which cause concern, most obviously in relation to pornographic images. Forensic software can detect a high proportion of flesh in a picture with a reasonable degree of accuracy, making it possible to trawl large volumes of image material in aid of potential criminal investigations or to seek out material which employees should not be keeping on company servers.
As it becomes more common to convey information in pictures, eDiscovery requirements move beyond the need merely to identify porn. ZyLAB has added visual classification to its eDiscovery tools, with native visual search enabling fast identification of non-textual information.
We are used to this in everyday computer use: domestic photograph software allows identification of people by reference to their facial characteristics; Google’s image search does a reasonably good job of finding things and scenes by allowing comparison with a source file. I had a recent example of this: a photograph of wartime London included an equestrian statue and I wanted to know where the photograph was taken; by cutting out a distinctive part of the statue, I was able to locate it with the help of Google’s image search which quickly found matches for the picture extract.
ZyLAB’s Visual Classification can also be used to identify images within documents, which may be useful for finding personally identifiable information, handwritten notes etc. There is a press release about ZyLAB’s Visual Classification eDiscovery tools here.
Coming back to words, potentially discoverable information often includes languages which are not English. ZyLAB has recently launched a Global Support Package which incorporates UNICODE support, linguistic support for full-text indexing, optical character recognition and the other features described in this press release.
At the higher end of linguistic and semantic search, ZyLAB presented a paper at DESI V in Rome, the workshop on standards for using predictive coding, machine learning, and other advanced search and review methods in eDiscovery. ZyLAB’s paper, written by Johannes Scholtes, Mary Mack and Tim van Cann, was called The Impact of Incorrect Training Sets and Rolling Collections on Technology-Assisted Review. Its conclusion is that the effect of incorrect training documents is smaller than expected; deliberately wrong training documents were inserted into a training set and resulted in far less quality reduction than one might think.
Lastly on ZyLAB, the company has recently deployed its eDiscovery and Production System on a very large case for a multinational service provider, supporting leading enterprises in the financial and electronic industry. The size and variety of the data sources, and the number of users, relied on ZyLAB’s ability to add processing power as required as well as on the broad set of functions which ZyLAB brings to projects of this scale.
There is a press release about the project here.