An interesting theme developing at this week’s FIRST conference is how we can make incident detection and response more efficient, making the best use of scarce human analysts. With lots of technologies able to generate alerts it’s tempting to turn on all the options, thereby drowning analysts in false positives and alerts of minor incidents: “drinking from your own firehose”. It was suggested that many analysts actually spend 80% of their time collecting contextual information just to determine which of the alerts are worth further investigation. If you are receiving more alerts than you can deal with in real-time, something’s wrong.
Jeff Boerio explained how much simpler checks can provide an initial yes/no triage. For many logfiles it’s enough to know “Have we seen X?” “when did it start?” “when did it stop?”. That’s interesting whether X is a domain in a proxy log, a user-agent string, a malware family, an e-mail source address, the location from which a user initiated a VPN connection, etc. Not only do those quick queries speed things up for analysts, they can also speed up databases: extracting a simple table with just the two columns (time, indicator) makes queries faster and takes load off the main logging database.
If speeding up the triage process still doesn’t give your analysts time to breathe, Josh Goldfarb had a more radical suggestion: maybe you should reduce the number of alerts? And rather than starting from the list of things that your IDS and other systems can detect, maybe start from the list of things that your organisation depends on. So find out what is the biggest security risk/threat for your organisation and set up the alerts needed to detect that. Then add the next biggest risk/threat, and so on, so long as your analysts retain a workload that allows them to do proper investigations of what may be the most harmful incidents for the organisation. Focussing on particular risk/threat narratives also allows you to automatically attach relevant information to each alert: adding the context the analyst will need for that particular type of event. For an alert of a compromised user account that could be things like what access privileges does it have, where has it logged in from, at what times of day, and so on. For a compromised machine, the analyst will want to know whether it’s a server or a workstation, what sensitive information it may store or have access to, etc.
This doesn’t necessarily mean reducing your logging – you still need logs for detailed investigation of those alerts that do appear to indicate significant incidents, and you should periodically do a wider review of logs to determine whether your risk prioritisation is still appropriate. The important thing is to decide quickly and efficiently when those deeper investigations are required.