In talking about the pending General Data Protection Regulation, I always take the opportunity to suggest that GDPR requirements might be the spur to the amorphous concept of information governance, providing the return on investment which companies have hitherto sought in vain.
I also draw attention to the application of eDiscovery skills and tools to an ever-wider range of problems from contract management to M&A. The identification of personal information in large bodies of documents, required for GDPR compliance, is an obvious example of this.
Adam Kuhn of OpenText has written a useful and interesting article on this called How we’re using discovery analytics to solve GDPR challenges. The whole (and short) article is worth reading, but its nub lies in this paragraph:
In this way, you can start with a known dataset (like your vendor contracts database) and then leverage analytics to identify unknown, risk-prone documents. As you review more documents and find more PII-laden content, the algorithm is constantly learning in the background. It conducts broad sweeps of your remaining data to prioritize batches of content that are likely to contain PII. What’s more, these algorithms can run on an issue-specific basis—a crucial ability since the GDPR distinguishes between “personal data” and “sensitive personal data.”
Adam Kuhn also makes the point that while “automated tools can get you started on a privacy evaluation…the ultimate analysis is too nuanced to rely exclusively on machine categorisation. Human review is an indispensable element…”
There is a cartoon going the rounds from the excellent Daniel Solove (it comes from his article here) which illustrates how companies are (or, at least, have been) looking at their budgets for privacy and comparing them with the generous amounts allocated for data security.
The GDPR brings these two subjects closer together, not least because of the new duties placed on data controllers and data processors, and specifically on the obligations to report data breaches within 72 hours (and to notify people affected by the breaches in certain cases). In the US, one is taught to be careful about using the expression “data breach” because not every loss or misuse of data is a “breach”. The GDPR has a much wider definition of data breach and the reporting burden is correspondingly high.
If you don’t know what data you have got, then you are going to be very pushed to identify what you have lost. 72 hours is a very short notification period, making the time limits for litigation discovery look extremely generous. If you were to add up all the different circumstances in which it is necessary to identify personal information – litigation discovery, regulatory demands, dealing with Subject Access Requests and more – and then add the new GDPR obligations, the ROI for investment in tackling the problem becomes very clear.
You can’t do everything at once. Tools and processes such as described by Adam Kuhn are a very good, very efficient and very cost-effective way of starting to tackle the problem.