During a recent conversation about learning analytics it occurred to me that it might be helpful to analyse how universities use student data in terms of the different justifications provided by UK and European Data Protection Law. Although the ‘big data’ techniques used in learning analytics are sometimes said to be challenging for both law and ethics (though the Open University have what looks like a pretty good attempt), it seems to me that the different legal justifications, and the rules and principles associated with each of them, could form a helpful basis for legal and ethical guidelines on handing student data.
Here’s an initial sketch of how that might work… Corrections, comments and suggestions would be very welcome.
The first group of reasons for processing information about students is those that follow inevitably from the individual’s decision to become a student. Simply being a student means the university has to hold application and contact details, record the student’s progress, lend them library books, provide them with computer access, manage fees, etc., etc. There are also external requirements on universities, for example to report on student achievement (a condition of receiving public funds) and attendance (where students have been granted study visas). Data protection law recognises this type of ‘necessary’ processing, whether required by a contract, a legal obligation or a public function. Since neither the university nor the student can choose whether this processing takes place the main data protection requirements are transparency, minimisation and security – applicants must normally be informed of what processing will occur as a result of their application (especially any non-obvious processing), data collection and processing should not go beyond what is necessary, and information must be protected by appropriate security measures.
Next there is a group of additional functions that the university might like to perform using the student data it has or may collect in the course of its operations. This could be as simple as working out which library books are popular to guide future purchasing decisions, or as complex as noting that students with a particular group of A-levels struggle on a particular module. Results are more likely to be accurate if data can be gathered across the whole student population: struggling students are particularly unlikely to have time to fill in yet another questionnaire. In the past this kind of processing could be used to improve educational provision to future cohorts of students, but collection and analysis were unlikely to be quick enough to affect current students. Data protection law provides three different approaches to this type of processing: the simplest is to anonymise the information so that data protection law does not apply, however this is very hard to ensure so long as records exist at individual level. Alternatively a specific exemption covers research that can be designed to ensure there is no impact on individuals: this may be possible when investigating specific questions (‘hypothesis-driven research’) where all possible impacts can be predicted, and excluded, in advance. However this is unlikely to be suitable for wider data-driven investigations where it is not possible to predict what correlations may be found. These need to be done within the terms of the ‘legitimate interest’ justification, which allows processing in the interests of the organisation (e.g. to improve its educational practice) but only if those interests are not overridden by the fundamental rights of the individuals whose information is being processed. The Article 29 Working Party of Data Protection regulators have stated that this involves a balancing test: the better the protection for individuals’ interests, the wider the range of organisational interests that can be supported. Such interests must be legitimate for an educational organisation and the transparency, minimisation and security duties still apply. This suggests that when processing data beyond that needed for student administration, universities’ first priority must be to ensure that the interests of the students are protected – whether by full anonymisation, by designing research to avoid any impact on individuals, or minimising the risk of impact and ensuring it is justified by the benefits of the results. If there is any risk of impact then transparency about the processing is required, and account must be taken of any concerns that individual students’ circumstances may increase their risk.
Last there are functions designed to help individual students. These may include applying the patterns discovered through research, for example by offering tailored support to individuals based on their past experience and current performance. One of the benefits claimed for the rapid analysis of ‘big data’ is that such support can be immediately responsive to current students’ needs, rather than just assisting the next cohort. Here the intention is to maximise the effect on the individual, so the approaches suggested above for research clearly cannot be applied. Some of these might be considered part of normal education – so covered by the ‘necessity’ justification discussed above – but this is probably limited to interventions that were envisaged at the time the student’s data were first collected. Novel or specifically targeted options should instead be offered to individual students, who may give or withhold their individual consent to them and the additional processing of personal data they entail. Under this model, students must have a free, fully-informed, choice whether or not to accept additional support; they must also be able to withdraw their consent and return to being treated in the standard way for their cohort.
It’s also worth noting that the law requires additional controls if decisions are taken automatically that will affect individuals. Big data analysis will typically find correlations rather than causation and it may well be advisable to have a human make the final decision on whether the machine’s proposed course of action is appropriate. “Data-supported decision making”, rather than full automation, should probably be the aim.