I’ve been reading a fascinating paper on “System Safety and Artificial Intelligence”, applying ways of thinking about safety-critical software to Artificial Intelligence (AI). Following is very much my interpretation: I hope it’s accurate but do read the paper as there’s lots more to think about.
AI is a world of probabilities, statistics and data. That means that anything that could possibly happen, might. We can adjust AI to make particular behaviours unlikely, but the statistical nature of the approach means we can’t make them impossible. This contrasts with or, the paper suggests, is complementary to, the approach taken in safety-critical systems, which declares some outcomes to be prohibited and uses limits and controls to prevent them happening; or, at least, to require human confirmation before they do. The key point seems to be that you don’t rely on a single system to restrict its own behaviour, you wrap independent controls around it.
In the AI world, even in applications where safety isn’t an issue, it strikes me that this kind of approach might be preferable to trying to incorporate all the limits we want into the AI itself. If we need to restrict what impacts the AI can have on the outside world, consider whether that can be done by way of a wrapper around it. Swaddling a baby keeps it warm, but also restricts its movement and cushions its surroundings against unexpected behaviour. And, once the AI is wrapped up, its design and training can focus on achieving the best statistical interpretation of its inputs; any low-probability quirks may be more tolerable if we know there’s an external wrapper to catch those that would cause harm. And, incidentally, to draw designers’ attention to the fact that something considered unlikely did actually happen.
In my visualisation of the draft EU AI Act, such a wrapper would probably invoke Human Oversight, but fit mainly in the Risk Management, Quality Management and Lifecycle areas. As those headings suggest, outside contexts where the use of AI is entirely prohibited, the Act itself aims at managing risk, rather than making certain outcomes impossible. But there are a few hints at hard external limits on the AI: “mitigation and control measures in relation to risks that cannot be eliminated” (Art.9(4)(b)); “Training, validation and testing data sets … particular to the specific geographical, behavioural or functional setting” (Art.10(4)), and “shall use such systems in accordance with the instructions of use accompanying the systems” (Art.29(1)). Technical and procedural controls that stop the AI going beyond its intended setting or operating outside instructions might be good candidates for an external wrapper.
This approach naturally focuses attention on the links the AI has to the outside world: these are the points where the AI approach and the safety-critical one meet. What information can the AI measure? What levers can it pull? How could that could go wrong, either through accident or deliberate, including malicious, action? Are there inputs – for example a situation outside the one the AI was trained for, or data such as signs deliberately modified to mislead – where exceptional action is required: warnings, changing algorithm to something simpler or more explicable, ignoring the particular input, or returning control to a human? On the output side, are there actions that the AI is capable of triggering that we need to prevent from taking effect?
This reminded me of a situation, many years ago, where a “smart firewall” decided its network was under attack on UDP port 53. In accordance with its design, it blocked the “hostile” traffic. Unfortunately blocking responses to DNS requests turned out to be a very effective way to make the Internet unusable for everyone behind that firewall. This does seem like an example where we would want the AI wrapper to intervene, probably by asking a human to confirm whether this particularly significant port number should be treated according to the normal blocking rules.
And that, in turn, suggests what I think may be a common rule: that the wrapper around the AI needs to be designed by those with expertise in the particular domain where the AI will operate. A data scientist might reasonably assume that port 53 is no different to ports 52 or 54; a network manager will immediately know its significance. Having identified these unusual situations, the domain experts need to work with AI experts to identify how they might be detected and responded to. Are there relevant confidence levels that the AI can use to warn human operators, to increase logging, or change to a different algorithm? What information or signals could it generate to help operators understand what is happening? What alternative processes can we fall back to if it’s no longer safe to rely on the AI?
Considering those questions before deploying AI should significantly reduce the number of nasty surprises after it starts operating.