I have been engaging more and more with education practitioners as we work through business scenarios for what a “real world” implementation of Trusted Analytics would entail. What does it look like and what does it do for your business. We still have some work to do to get the product right, and will pivot a number of times before we’re done.
One thing I know already is that we need to get on the same page when we talk about data privacy and privacy risk. As a term it has as many interpretations as interpreters, and frequently serves to confuse rather than clarify. When we talk about data Privacy Risk on this project, we simply mean “the risk that an individual’s identity can be revealed or discovered from the data”, and assume that direct identifiers such as name and email address are not present. In effect what we’re really talking about, and the term I believe we should use, is “Residual Data Privacy Risk”.
I’ve used the term “residual risk” quite a lot when explaining what we’re about in the past and I do feel this is key. Maybe we need to go back to basics with this term:
Risk: “The effects of uncertainty on objectives” (ISO 31000)
Thanks to increasing legal penalties (and direct commercial penalties from increasingly privacy-aware consumers) effects on business objectives of unforseen data privacy breaches are increasingly important and organisations ignore them at their peril (and are increasingly aware that ignoring is not an option if they want to continue doing business with many governments and commercial entities who insist on GDPR and other data privacy risk management standards).
We will generally mitigate the risk of personal identification of an individual from our data and reports by removing “identifiers” before sharing them. These would include names and email addresses as well as certain quasi-identifiers such as phone number or date of birth which can be used to identify someone with a high degree of certainty in certain circumstances.
We can go even further if we have the time and resources, with enhanced mitigation such as summarising the data (e.g. putting people into age ranges in place of their actual age) and obfuscating important identifiers (using hashing algorithms to replace email address for example), or even “perturbing” the data (change values slightly to reduce the probability of a match), or adding “noise” (extra synthetic records to obscure unique outliers). The problem of course is that the heavier the application of these mitigations, the more you obscure the original data and the less reliable it becomes. So how do you know if you’ve done enough? This is all about managing your Residual Risk.
Residual Risk: “Inherent risk – risk mitigation = residual risk”
The objective is to reduce your residual risk to a level your organisation finds acceptable but not go so far as to render the data unintelligible. In order to get the balance right and it is important to be able to do two things with the residual risk:
- Measure it of course, so you can know whether and to what extent you should apply additional, enhanced, mitigations.
- Mitigate it to an acceptable Residual Risk using the techniques which are necessary and sufficient to achieve the acceptable risk and no more.
Of course that’s not the full story – to interpret a residual risk measure we also need to understand the likelihood, which is more contextual (certain data may be more likely to reveal someone’s identity in one context than another), and the potential impact to the subject of the risk. I see these as policy level measures – what you do with a residual risk measure will vary depending on the data, context and risk appetite of the organisation and can be encapsulated in their policies.
Measurement of the Residual risk requires a standardised, universally understood measure which can be incorporated into our policies and procedures. Consideration of industry-specific details and impact to the data subject will help us derive nuanced guidelines on how to interpret and act on a measure of residual re-identification risk. This is what we are working on now, and beginning to use the first releases of residual risk measurement in our in-house reporting. By adding a Residual Risk measure into all our reports and data extracts, and discussing it in our policies, we hope to do two things:
1. Raise privacy awareness
With a colour-coded residual re-identification risk measurement in all our reports and extracts, our employees will be reminded of data privacy every time they access a dashboard or receive a report. Over time we hope it becomes second nature to think that receiving some data with direct identifiers as weird and insecure, and to understand there is always some residual risk of re-identification for any de-identified dataset.
2. Enhance risk mitigation AND democratise data
By also including guidelines on residual risks in our data management policies and procedures we hope to create an environment whereby people can have greater freedom to use their data to improve business outcomes. If employees can see the residual risk associated with the data, and have clear guidelines around the use and distribution of data within risk ranges, then they can have the freedom and confidence to put our data to a wider range of uses while improving our privacy risk management practices.
Freedom of movement and data driven growth
We think better insights and guidance on residual privacy risk will empower our people to make more, imaginative use of the information available to them and that this will help us improve our service and grow our business. We are just embarking on this experiment and will keep you posted.