Relativity has published a film called Pandemic. It subject is the application of analytics and artificial intelligence to the very large volumes of information which exist about the Coronavirus. It is delivered mainly in the words of those involved in a project which brings together skills and knowledge from medicine and from data science to make that data useful. It is interesting both for itself and as an example of eDiscovery skills and tools being used for purposes well beyond their home territory.
The film stands alone as its own story – you will find it here and it is linked to at the bottom of this post. To me, however, it is also part of a continuing story of eDiscovery borrowing tools from other places and, in turn, using them for new purposes.
Once upon a time, as a litigating lawyer, I decided that the dull, repetitive, and time-consuming task of giving discovery could usefully borrow ideas from warehouses and widget factories. They didn’t type up lists of components and products, but used computers to sort and count things. There then being no software in London which applied computing to discovery, I wrote some.
Time passed, and it became usual to use computers for discovery. Computers and specialist software moved beyond sorting and counting, and into analytics, able to take very large volumes of data and make it useful, finding matches, identifying duplicates, looking for meaning, and determining likely relevance to issues. The aim was not to give discovery – that still needed humans – but to reduce the time taken to find the things which mattered, and to improve the accuracy.
Meanwhile, computing power advanced, making it possible to process more data more quickly. In addition, everything became more urgent – we weren’t merely needing the output a leisurely few years later but now, because regulators expected you to know of or to anticipate activity, and cyber criminals are at work now. The skills and tools of electronic discovery were extended to cover a wide range of functions and activities involving corporate and legal data. Contract management is an example of a function which, almost overnight, used the tools of eDiscovery for something more positive and pre-emptive than retrospective discovery.
More time passed, and those companies which had been successful with their eDiscovery tools looked out for other opportunities to use their software and their abilities. That might be for expanding their existing market, or for finding new markets. It might be for doing something good and useful for society, something which needs the company’s abilities but also needs a willingness to invest for reasons beyond next year’s profits.
This has been an ambition at Relativity for a long time – I know that because I can recall a conversation in Washington in about 2016 at which I was asked for ideas about wider uses. Whatever I said then, it was rather pedestrian when compared to what turned up.
Scenes of empty streets are interspersed between explanations from the people involved – from medicine, from data science and from Relativity itself. Apart from those shot outdoors in Chicago, the talking is all done from the speakers’ own rooms, which brings up a collateral point – had this been filmed in other times, it would have been thought necessary to put the film crew and the speakers in the same room. The speakers come from all over the place and, before our adjustment to remote interviews, the film would probably not have been made.
I began this article by reciting briefly the interplay over the years between technology developments and requirements – technology advanced to meet developing needs, and those requirements developed to take advantage of the available technology. Regulators increase their demands; clients seek ways to meet them; technology is developed to help; regulators re-set their demands having regard to the new capabilities; and the (largely virtuous) circle goes on.
As I write, Twitter brings a thread which touches on the “social acceptance” of AI. What is needed, you might think, is a large-scale project, involving masses of documents and data both structured and unstructured, which delivers a practical benefit to a society wider than regulators, lawyers and litigators. To be effective as an example, it needs all the elements of an eDiscovery exercise – short time-scales, duplicates, too much to read, a set of parameters (“issues” in discovery terms), and subject-matter experts who must get to the essentials quickly without wading through a large (and fast-growing) body of information.
No-one “needs” a pandemic, but we can’t fail to see in current events an opportunity to test the idea that technology, and specifically the intelligent parsing and analysis of text, has real-world application when major decisions, affecting hundreds of thousands of lives (to say nothing of the global economy) must be made based on data – not just neat rows and columns of statistics, but large bodies of text. That is what this film is about.
I won’t attempt to summarise the whole thing, but I pick out a couple of factors which matter and which have direct eDiscovery parallels. One is the identification of duplicates, which reduced the volumes by a significant amount. The other is the ability to draw conclusions, not just from the data tself but from its sources. It became clear that much of the input came from biased data – it derived largely from well-off white men – white males from prosperous regions dominate much of the research over decades. The result can be faulty diagnosis of patients who do not fit the pattern. In the short term, AI brings the risk of mutipying these biases, but over a long term AI makes it possible to identify biases and counter-balance them.
The data helps identify causes as well as treatments, and can take account of variables such as underlying conditions. One of the speakers says “we are getting better at finding the right data at the right time to make decisions [using] data a normal human can never go through, and find relationships we didn’t know existed” by bringing together data from all over the world.
You can easily see the parallels with the demands of eDiscovery – data from multiple sources being used both for matching against existing criteria and for finding “relationships we didn’t know existed” between people and other entities.
This is not the first time that disease has given the motive for advances in data analysis. I saw a television programme recently about London’s Great Plague of 1664/5 whose data was meticulously recorded and used to make decisions both for immediate action and for longer-term planning. The so-called “Spanish Flu” of 1918-19 began with misinformation (there as nothing peculiarly Spanish about it for a start) but it led to world-wide recording of a pandemic which affected roughly a third of the world’s population in four waves. Lessons learnt in one place were adopted in others.
London’s plague was blamed on rats, but modern analysis points to human-to-human interaction as the cause of rapid spread of disease. The 1919 epidemic was blamed in some places on poor hygiene not infection, and the role of climate and of the world-wide shift of people as soldiers went home after the war were both under-estimated. If you look in the wrong place for causes, you probably come up with the wrong solution. AI allows us to do that now, while pandemic rages, not years afterwards when the data is analysed, if at all, by hand.
The project described in the Relativity video is an important one, not just for the immediate problem it sought to address, but also for longer-term acceptance of the value of mass collection and analysis of data. As one of the speakers observes, that has implications well beyond medicine.