Apples and Oranges

In discussions of the “Right to be Forgotten” it is often observed that Google manages each month to deal with tens of millions of delisting requests for breach of copyright, as opposed to tens of thousands for inaccurate personal data. Often the implication seems to be that those numbers should be more similar. However it seems to me that the two types of request need to be handled in significantly different ways and that they probably require, on average, significantly different amounts of manual effort per request received. If the processes ought to be different, then we need to be careful when comparing them, lest we (or search engines implementing them) come to the wrong conclusion.

The main differences concern the source of requests and the content to which they apply.

It seems likely that most requests to “forget” will come from individuals and that, unless they are particularly unfortunate, most individuals will only have one or a few pages to complain about. That means Google may well have to check the requester’s identity and entitlement to make a request for nearly every “forget” request they receive. That contrasts with copyright delisting requests that generally come in large numbers from a small number of rights holders and their representatives. That can allow a much more efficient identification process, for example by exchanging digital signatures so the sender’s identity can be verified automatically in future.

Automation is also a possibility for copyright delisting as most requests will apply to the second, tenth or hundredth identical copy of the same digital file. Once one copy of the file has been assessed as probably infringing, requests relating to further identical copies can be recognised immediately using hash values. It seems likely that anyone trying to implement an efficient takedown process would conclude that all identical copies should be treated in the same way. With “forget” requests, by contrast, it seems unlikely that identical pages will reappear so, again, every request will need to be assessed manually.

There are also significant differences in the laws that apply to the two types of request, which ought to make a difference to a search engine that tries to implement them accurately.

The European Court’s definition of the “right to be forgotten” under Data Protection law explicitly requires judgments and balancing tests in every case: is the material inaccurate, irrelevant, irrelevant or excessive? does the public interest in finding the material outweigh the individual’s right to object to processing? For material written in human language, it’s hard to conceive of a computer being able to apply those rules. Copyright law involves different tests: is the material subject to copyright in a relevant country? is the publication covered by fair use or other exemptions (again, with national variations)? Here there may be some possibility for computers to help, particularly when multiple requests are received for the same material.

If there’s any value in comparing and contrasting the two kinds of request, I think it needs to be done at this kind of detailed level. Raw numbers of requests don’t say much about what is (or ought to be) going on.

By Andrew Cormack

Leave a Reply Cancel reply