A thought-provoking talk at the TERENA Networking Conference by Barry Smyth of the Insight Centre for Data Analytics suggested both the possibilities and the problems of big data, and some of the decisions that society needs to make soon about how we do, and do not, use it to maximise benefits and minimise harms. A couple of examples highlight the scale of what might now be possible: the smartphones we have in our pockets have sufficient CPU power to measure our lung function – to within 6% of the accuracy of a clinical device – from the sound of us blowing into the microphone; every six hours humanity generates around an Exabyte of data, that’s roughly the number of words uttered by the entire human race, ever!
With that amount of data and processing power, it turns out that algorithms no longer have to be particularly good in order to extract valuable information from messy and diverse input. In education, universities are already using data from libraries, VLEs and other student histories to identify students who may have problems keeping up with their course. Areas of teaching and materials that are unclear become obvious if 2000 students on a MOOC all give the same wrong answer to a question. Algorithms can use students’ current performance to identify who needs help and what sort of additional support will give them most benefit, increasing top grades by more than 25% in some studies. Conversely there is a risk that inappropriate interventions may cause significant harm; Professor Smyth suggested as a guideline that big data should only be used to fill in the gaps in our knowledge, not to override the decisions that humans are best placed to make.
Larger datasets create the possibility of “listening to signals from the crowd”. Previously unknown side effects of drugs were identified by correlating 82 million queries entered into search engines by 6 million users. High resolution real-time maps of air pollution can be derived by connecting sensors that report when asthma inhalers are used with simultaneous location data from their users’ mobile phones.
These types of application hint at the ethical challenges now emerging. Health and location are highly sensitive aspects of personal data, yet by analysing them it’s possible to warn others of temporary environmental conditions that could trigger anything from discomfort to serious medical harm. Leaving it to individuals to decide whether and how to make their contribution to their own, or society’s, big data may not be the best approach when both the potential benefits and harms, both short- and long-term, may be hard to assess and explain. In those circumstances individuals’ choices may easily be both too generous and too restrictive for their own, and society’s, good. In medicine there are already some decisions that are taken out of our hands: the law does not allow us to decide to sell our own organs, nor to keep secret the fact that we have a notifyable disease. Analysis of big data is now approaching those levels of benefit and harm to society: there are, or soon will be, things that systems can, but should not, do. Society needs to decide soon where the limits should be drawn.