Voice Processing: opportunities and controls

We’ve been talking to computers for a surprisingly long time. Can you even remember when a phone menu first misunderstand your accent? Obviously there have been visible (and audible) advances in technology since then: voice assistants are increasingly embedded parts of our lives. A talk by Joseph Turow to the Privacy and Identity Lab (a group of university researchers in the Netherlands) explored some of the less obvious developments that might be coming; some may even be present already. As well as what we say, might voice technology analyse how we say it? Could a voice assistant recognise our speech to connect us to our accounts? Could it sense our mood, and suggest appropriate comfort or celebratory food? Could it recognise signs of COVID, or other diseases? Could it recognise angry customers and pass them to an appropriately trained human? Could it detect when we are doubtful about a product, and offer a discount? Or when we are excited, and see an opportunity for upselling? Should it?

Turow’s book is about the advertising industry, but the discussion got me thinking about the tools that might exist to regulate the use of these ideas more generally. Could contraints help us explore what seem acceptable applications and technologies with more confidence that they won’t later slip into unacceptable territory? I’ve come up with four, but there may be more:

Law: Turow suggests that some applications of voice technology in advertising should simply be banned. This sounds like a strong lever, but competition authorities discovered long ago that if a prohibited technology practice was sufficiently profitable then legal costs and fines would simply be absorbed as a cost of doing business;
Economics (and, remember, I’m a mathematician, not an economist or marketer): Researching, developing and operating these technologies costs money. There ought to be a limit on advertising budgets: if the profit from selling one more item is less than the cost of the advanced tech required to make the sale, I can’t see the logic in using the technology;
Business: many of the major players in this space have multiple roles, which may have different business incentives. For example, as browser providers, both Google and Apple are changing their rules on third-party cookies in ways that significantly affect existing ad.tech. solutions. Technology can be a gatekeeper as well as an enabler;
Social: some technologies simply turn out to be socially unacceptable, for example it appears that when Google realised that Glass users were being treated as outcasts, the consumer product was withdrawn. Here the lack of visibility of what voice processing is actually taking place could cut both ways. If buyers, visitors and employers can’t tell how their voices are actually being processed by a particular device, they may become nervous about a whole class of products, even ones that don’t actually involve the offensive practice;

Following on from this last point, there was a fascinating contribution from a linguist who is researching potential uses of voice processing to help care for people living in their own homes. Sensing, triaging and responding to mood, emotion, even medical conditions might have much greater social benefit in these contexts than for advertising. Unfortunately none of my tools seem to cope well with that level of nuance.

By Andrew Cormack

Leave a Reply Cancel reply