Threat Intelligence: for machines and humans

Threat Intelligence is something of a perennial topic at FIRST conferences. Three presentations this year discussed how we can generate and consume information about cyber-threats more effectively.

First Martin Eian from Mnemonic described using (topological) graphs to represent threat information. Objects, such as domain names, IP addresses and malware samples are vertices in the graph. Facts about them are edges. So an edge of type “resolves to” would connect a domain name and an IP address. Databases that use this kind of structure are widely available and make it easy to explore threats by visualisation and pivoting. Access controls can be applied to edges, thus allowing different people and partners to access parts of the graph that are relevant to them. While individual threat reports tend to create islands in the graph, databases can suggest links between those islands, where what may be the same object appears in more than one island. To be most effective this requires those populating the graph to use strict vocabularies: if two things are the same we should endeavour to make them the same in the graph. But where things may be the same (for example multiple names for malware or threat actors) it’s better to keep the different names as different objects, linked by “alias” facts. Choosing a single canonical name turned out to be a bad idea as it makes it impossible to disentangle the graph if you later change your mind. These – in fact all facts – should be labelled with a measure of “confidence”, so we know how much reliance we can place on the conclusions we draw. Changes of mind should also be explicitly recorded: facts should be timestamped and never changed. Instead, new, later, perhaps different confidence, facts should be added to the graph. As well as recording how our thinking developed, the sequence of timestamped facts should itself reveal threats such as fast-flux DNS (a rapidly changing “resolves to” link) and the development of malware families. Such labelling may be more work for the creator of threat intelligence than simply copying machine-generated alerts into a file. But good threat intelligence should aim to be write-once, read-many. As Trey Darley (CERT.be) and Andras Iklody (CIRCL) pointed out, adding context is like putting comments in source code: it’s worth taking the time to help others, not to mention your future self.

Trey and Andras developed these ideas further. Ideally, our threat intelligence should be valuable to many different types of consumer, including Security Operations Centres, Internet Service Providers, incident responders, threat analysts, risk analysts, and decision-makers. The technical information for each group may be similar, but how they use it will be very different and highly dependent on the contextual information that surrounds it. SOCs want to know which alerts they already have, or can deploy, protection against and which novel ones need deeper investigation; ISPs, who control Internet access for thousands of users, cannot afford false positives, so need to know which data points are sufficiently robust to use in blocking rules; Incident Responders want to see how an incident developed, so they can look out for similar signs; Threat Analysts want to understand motivation, modus operandi, attacker infrastructure, and unknown attack vectors; Risk Analysts want to see patterns in attacks, sectors and geography; Decision-Makers want evidence to inform decisions on resource allocation, including which threat information feeds to continue paying for! Use existing protocols to indicate how information may be used (for example the Permissible Action Protocol) and whether it may be shared (Traffic Light Protocol), but be clear whether these apply to the whole report or only parts of it. The aim of sharing should be to help others protect themselves: if you have reports, scripts or configurations that helped you, consider whether you can pass these on, too. Although a lot of the discussion around information sharing has focused on machine-readable information, this highlights the need to connect this to human-readable information, too.

A tool for doing just that was presented in a wonderful – costumed! – talk by the Fujitsu team of Ryusuke Masuoka, Toshitaka Satomi and Koji Yamada. Their S-TIP platform creates a bridge between human and machine worlds, on both the input and output sides. Human sources – blogs, incident reports, social media posts and emails – are scanned for data such as IP addresses and domain names, bitcoin addresses, CVEs, malware hashes and threat group names. These are tagged so they can be linked to machine-readable alerts and Indicators of Compromise. Chatbots within the system can then add richer information and links. Using this combined information, human analysts can quickly determine what (if any) action may need to be taken. This, too, is automated: the system has one-click links to Jira (to create block requests), Slack (to share internally), MISP (for external sharing) etc. In each case the original human-readable context accompanies the machine-readable instructions.

By Andrew Cormack

Leave a Reply Cancel reply