Sandbox Tales: Machine Learning

The latest reports from the ICO sandbox provide important clarification of how data protection law applies to, and can guide, the application of novel technologies. This post looks at machine learning…

Onfido’s engagement looked at how to train and review the performance of machine learning models. In thinking about that I’d concluded that the GDPR provided more useful guidance if you thought of Training and Review as separate processes from actually Operating the model. In Onfido’s case that’s a legal necessity, because its model operates as software-as-a-service, with the customer as data controller and Onfido as data processor. Customers aren’t involved in Training and Review – though obviously they want those to happen – so Onfido must be the data controller for those steps.

When thinking about a single organisation doing Training, Operation and Review, I’d suggested that there should be a primary legal basis for Operation, with the GDPR “research” provisions used as compatible extensions of that to Training and Review. That provides strong safeguards, notably that any impact on individuals must be minimised, justified by the benefits, and that there must be no possibility of using the resulting data to make decisions about individuals. Since Onfido is a data processor for Operation, it needs a primary legal basis for Training and Review: the sandbox report suggests Article 6(1)(f) Legitimate Interests. That provides safeguards that – you guessed – any impact on individuals must be minimised, justified by the benefits, and that (now as part of impact minimisation) there must be no possibility of using the resulting data to make decisions about individuals.

The details of the Onfido service raise a couple of other interesting issues. It supports banks – particularly in COVID times when it may not be possible to go into a branch to open an account – by verifying that a photograph of a face (typically from an ID document presented by the applicant) matches a current selfie – “are these the same person?” – and has not been tampered with. Training and Review are done with pairs of historic images and, perhaps, information about the origin of the ID document, but no information that would identify the individual. It might be argued that this is not personal data at all. But if the same individual were later to apply for another account, then Onfido’s data processor function might handle identifying information about them, bringing the training and review processes within scope of GDPR as well.

If the face pairs might be personal data, then it’s likely that they count as “biometrics”, and so will be classed as Special Category Data under GDPR. But since it has been widely reported that many Machine Learning algorithms perform very differently with faces of different racial types, there is an Article 9 basis that fits snugly and, again, provides strong safeguards: the “substantial public interest” in reducing discrimination.

In summary: a textbook example of how, if you are trying to do the right thing, a detailed study of data protection law will usually be a strong and helpful guide, rather than a barrier.

By Andrew Cormack

Leave a Reply Cancel reply