Measuring “network health”

A panel session at the FIRST conference on comparable security metrics made me wonder why this seems to be so hard. My first visit to another CSIRT, fifteen years ago, was to work out how to compare our Janet CSIRT statistics with those from SURFnet. And yet the tricky question still seems to be working out what it is you are actually measuring. Most incident statistics actually give you a reasonable idea of how busy the CSIRT is: as with most metrics the absolute values don’t mean much but the trend – whether more or less busy – probably does.

However what most people are looking for is some measure of “network health”, as a better guide for policy making than anecdotes, headlines and fear. That turns out to be a lot harder. One reason may be that most CSIRTs have two sources of incident reports (the ratio between them depends on how close the CSIRT is to a network). Where the CSIRT can monitor traffic on a network or to honeypots it should be able to derive reasonably consistent measures of security events, or at least attacks. An increase in either metric probably means that the network has become a less safe place.

But most CSIRTs also receive incident reports from their customers. While it would be nice to think that those too measure the level of badness on networks, with this year’s Verizon Data Breach Investigation Report finding that only 20% of incidents are discovered by the organisation that is the victim, it seems more likely that they actually measure the organisation’s ability to detect incidents. If that’s right then an increase in that metric actually means the network has got safer, as we get better at detecting (and presumably responding to) the incidents that are occurring. So the single figure for “number of incidents” handled by a CSIRT may well combine one trend where increase is bad with another where increase is good. No wonder it’s hard to work out whether an increase in that sum is a good or bad thing!

So it seems that one way to improve the value of statistics would be to keep those derived from direct measurements of networks and traffic separate from those that may actually be measuring the effectiveness of one or more human and organisational detection and reporting steps. In both cases the CSIRT needs to be aware of, and compensate for, any changes it has made that could affect the figures (for example changing measurement technology or rules, or running an awareness campaign to encourage detection and reporting). Then comparing trends between different networks, countries or regions might become a bit more meaningful.

By Andrew Cormack

Leave a Reply Cancel reply