Sensitive/Special Category Data and Learning Analytics

In thinking about the legal arrangements for Jisc’s learning analytics services we consciously postponed incorporating medical and other information that Article 9(1) of the General Data Protection Regulation (GDPR) classifies as Special Category Data (SCD): “personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation” (most of this is Sensitive Personal Data under current law). However there is now interest in including such data so we’re looking at how this might be done: in particular because the legitimate interests legal basis we recommend for other learning analytics data isn’t sufficient for SCD.

GDPR Article 6(1) sets out the familiar six legal bases for processing personal data: contract, legal duty, vital interests, legitimate interests, public interest and consent. However processing SCD is prohibited unless the data controller can also meet one of the narrower conditions in Article 9(2): for example consent must be “explicit”, public interest must be substantial, and vital interests can only be used when the data subject is incapable of giving consent. These Article 9(2) conditions will apply in two circumstances: if SCD is used as an input to a learning analytics process (for example because it has been found to have predictive value or to detect when algorithms are generating discriminatory patterns); and if learning analytics techniques are applied to try to derive SCD (for example health) as an output from other types of data.

SCD as a learning analytics input

It has been noted that SCD might be a useful factor to take into account for some learning analytics purposes: for example knowing that I am red-green colourblind might let a personalised learning system realise I need more time for exercises involving chemical indicators or geological maps. An Open University paper identifies similar opportunities.

For most learning analytics processing of SCD, it appears the only available Article 9(2) condition will be the explicit consent of the individual. However this legal requirement may, in any case, be a practical necessity since the usual source of such information is the individual voluntarily disclosing it. This information gathering is dependent on the individual telling the truth (one of my colleagues ticks a minority religion just to ensure he gets his in-flight meal first!), so it needs to be done in a way that reassures them that truth-telling is both safe and advantageous. This is likely to involve something that looks very like an explicit consent process: providing full information about both positive and negative consequences, avoiding any pressure to grant or refuse consent, and getting active agreement to the proposed processing. If information may be used for more than one purpose, individuals should be allowed to consent to each of these separately.

Valid consent can only be obtained if the consequences of granting or refusing consent are made clear to the individual in advance. This means that algorithms including SCD can only be used for decisions or interventions that were foreseen and explained at the time the data were obtained. Unlike legitimate interests, the data controller can’t specify a broader purpose and then seek later consent for a specific intervention. However current proposals for using SCD do seek to answer specific questions, known in advance, so this is unlikely to be a significant restriction in practice. Indeed, if computer algorithms are being used to replace human inspection, individuals may well see this as a privacy-enhancing step and be more willing to provide their data.

The consent process provides a useful additional check of data subject sentiment: if individuals are comfortable with the proposed uses of information and safeguards to protect it, then we should see at least a constant rate of consent being given and ideally an increasing one. Certainly if the rate of consents drops, or is lower than the current rate of return when the same information is collected for HESA statistics, we should immediately check why our activities are being perceived as “creepy”.

Under the standard Jisc model, organisations are recommended to seek consent to the interventions that result from learning analytics processing. Typically a student should be offered a choice between generic and personalised treatments or have a free choice whether or not to take up a proposed intervention. Even though consent to gather and process SCD will have been obtained at data collection stage, it still seems advisable to offer a second consent option when a specific intervention is suggested, both because more detail can be provided at this stage and because individual students are likely to want to choose which interventions they accept.

SCD as a learning analytics output

It has been suggested that learning analytics-type approaches might be used to derive early warnings of health problems from other types of data. Using explicit consent for this is likely to be tricky, as much of the (non-SCD) input data will be observed, rather than collected directly from the individual. With research at an early stage, it is also likely to be hard to inform the individual in advance of the specific consequences of granting or refusing consent. A more appropriate option is likely to be “the purposes of preventive or occupational medicine” (Art 9(2)(h)). This requires that “data be processed by or under the responsibility of a professional subject to the obligation of professional secrecy” (Art 9(3)), so medical professionals would need to be involved in any such activity.

Processing designed to generate SCD as an output seems certain to meet the Article 29 Working Party’s threshold for requiring a Data Protection Impact Assessment (DPIA), since it involves at least “evaluation or scoring”, “sensitive data” and “innovative use” (see pp 9-11 of the Working Party’s guidance). Where the purpose of processing is to discover previously unknown SCD – perhaps not even known to the individual – this may well constitute a “high residual risk”, requiring the prior approval of the national Data Protection Authority (for the UK, the Information Commissioner).

SCD as a learning analytics input

SCD as a learning analytics output

By Andrew Cormack

Leave a Reply Cancel reply