[Based on a presentation for the NISO Plus conference, February 22-25, 2021]

One thing it seems everyone knows about Europe is that we have a strong privacy law: the General Data Protection Regulation, or GDPR. In this talk I’d like to get you viewing that not just as a law, but as a really useful way to think about designing systems and processes. And maybe challenge a few myths along the way.

Here’s what the GDPR itself says it’s about:

This Regulation lays down rules relating to the protection of natural persons with regard to the processing of personal data and rules relating to the free movement of personal data.

You’ll hear a lot about the “rules relating to the protection of natural persons”, so I’m not going to talk much about that. What I’d like to focus on is the much less referenced “rules relating to the free movement of personal data”. GDPR is explicitly – in its very first Article – about helping the movement and use of data, so long as that’s done in a way that’s safe for individuals.

First Myth

So – my first myth – GDPR isn’t (primarily) about individuals, it’s about the organisations that handle their data. All the GDPR Principles are aimed at them. And those Principles are a really useful guide to designing safe products, services, and other activities. For example:

Accountability requires not only that organisations are compliant, but that they can show they are compliant. So we must think – before we start to use personal data – about the design of our systems and processes, safeguards against error and misuse, how we will operate them safely, and how we will ensure those plans actually happen. A key point is that the focus must be on the individuals and groups whose data we process, not on the organisation. And the GDPR provides a tool – the Data Protection Impact Assessment or DPIA – to guide that thinking. DPIAs are mandatory for large-scale and otherwise high-risk processing, but they are a really useful tool for thinking about smaller activities, too. And once you’ve done a DPIA, why not publish it to show your users and other stakeholders that you are taking care of their interests?

Another principle (both of law and design) is Purpose Limitation. This requires us to think clearly and precisely about why we are collecting and using personal data. Multiple purposes may be OK, but we have to be clear – in our own minds and in our documentation – what those are. “In case it comes in useful” isn’t a convincing purpose either for Regulators or for stakeholders. And, having set our purposes, we must avoid “creep” beyond them.

And once you have identified one or more purposes, you need to ensure that your organisation has a lawful basis for that purpose. Is it something you need to do in order to fulfil an agreement with the individual (for example to pay a salary, or deliver a service they have requested)? Or something you are required to do by law (telling the tax office about the salary)? Or (and we hope not to be in this situation) something that’s needed to save a life or prevent serious injury? Or something that is in the public interest – and where our organisation is best placed to do it – or something that is in the interests of the organisation itself, individuals or third parties it may work with? Each of these has its own conditions that our design must satisfy: in particular for public interest and legitimate interest we must balance our interests with those of the individuals whose data we propose to process. If it’s hard to meet those conditions, then you probably need to rethink either your design, or whether you should be doing this at all.

Second myth

GDPR isn’t about preventing processing, it’s about allowing processing that’s necessary. And “necessary” has a very specific meaning – that there’s no less intrusive way to achieve the purpose. So it forces us to think – again good design practice – about minimisation. How little data does the purpose need, how little processing, how little disclosure (both internally and externally), and how soon can we get rid of it?

GDPR and its guidance recognise lots of technologies as contributing to this: attributes (what someone is – student, staff, guest – is often more useful than who anyway); pseudonyms, which let us recognise a returning user, but not identify them; statistics, where we can achieve our purpose with counts, averages, and so on; roles that allow us to define and enforce policies; and federations, which we’ll come back to later.

Third myth

GDPR isn’t (mostly) about choice, it’s about notice.

With very few exceptions, people must be told the “natural consequences” of the situation they are in, or about to enter. Most of what you must tell them is the product of the thinking in the first two stages: who is processing their data, what processing you are doing (including the legal basis), why (including the purpose(s)), how long this will continue (and what happens to the data when it stops), who else (and where) may be involved, and how to exercise their rights over their data.

Sometimes – but far less often than is claimed – individuals will actually have a free choice whether or not to give you their data. But remember the five legal bases: if you are offering them a service, or required by law to process the data, or saving life, or serving a public or other interest, then their choice probably isn’t free. In those cases, this quote from Guy Singh-Watson is relevant:

Customers expected us to do the right thing on their behalf, not just give them the info to choose for themselves (arguably an abdication of corporate responsibility).

And Guy isn’t a data protection guru – he’s a farmer, who runs an environmentally responsible veg.box scheme. If he knows what corporate responsibility looks like, shouldn’t we try a bit harder?

Most often, I’d suggest, true consent will be appropriate when you’d like an individual to volunteer additional information, to get into a deeper relationship with you. Not to discover whether they want a relationship at all. If you can’t find a basis for that initial relationship among the first five bases, maybe re-think your plans.

So, thinking with GDPR helps us to meet the expectations of our users, customers and wider stakeholders.

We reduce the flow of information;
We increase the benefits we deliver from what we have;
and, by doing that publicly, we can provide a basis for increasing confidence and trust.

How does that work in practice?

Federated Access Management

Let’s look first at how students get access to the content they need for their courses.

Historically, that was a two-party relationship, where the student had to set up a personal account with the content provider, containing lots of personal data. Most of which didn’t actually help the provider either to decide whether the student should have access (because it was self-declared) or to deal with problems if they misbehaved.

Thinking with the GDPR principles – and some smart technologists – we realised that inserting the student’s institution as a trusted third party produced a very different data flow. Now, the student requests access, the provider checks with the designated institution whether they are covered by the licence. The institution uses its existing relationship and data to strongly authenticate the student, associate them with the licence and undertake to deal with any misbehaviour. Everyone benefits under this Federated Access Management model.

Analytics

Or, thinking about Analytics. Institutions do stuff: whether teaching, providing support, or providing facilities. Data trails generated by students and staff in using those facilities can be analysed (as a compatible purpose) to work out how to improve them (an obvious legitimate interest, with the balancing test ensuring it’s done safely).

If additional information from the student would help, we can ask them to provide it, always being aware that they may refuse or lie. And, if there’s an opportunity for individual improvement, as well as system-wide, we can suggest that. Again, the student can refuse to follow the suggestion. Limiting consent to these last two stages means our analytics and improvements can be based on whole-cohort data, not self-selected. Students can be reassured that the institution has weighed the risks and benefits to them, and that their actions in donating data or acting on personalised suggestions are free and fully-informed. Again, everyone benefits.