Domain Name Service resolvers are an important source of information about incidents, but using their logs is challenging. A talk at the FIRST conference discussed how one large organisation is trying to achieve this.
DNS resolvers are used legitimately every time a computer needs to convert from human-friendly names (such as www.google.com) to machine friendly IP addresses (such as 173.194.67.106). That’s a huge number of queries – for HP’s internal network, eighteen billion log records every day. That rate of logging is a technical challenge for storage and for database software: querying the resulting database may be even harder. But hidden within that lot may be evidence of various malicious uses of DNS, including:
- Attacks on DNS itself. If you want to harm an organisation then slowing down or crashing its DNS servers is nearly as effective as cutting it off the Internet entirely. Some vulnerabilities also allow false data to be inserted into DNS servers, so that anyone within the organisation requesting a particular domain name will instead be directed to a harmful server, controlled by the attacker, which may masquerade as a genuine login or payment page, infect visiting computers with malware, etc.
- Attacks using DNS. In many cases the response to a DNS query is much bigger than the query itself, making the service an ideal amplifier for denial of service attacks. Here the organisation is not the primary target of the attack, though it may suffer collateral damage.
- DNS as an infrastructure. As an essential protocol, DNS traffic is very rarely blocked by firewalls. The volume and variety of legitimate queries also makes it hard to spot unusual traffic. Various techniques have been developed to use DNS queries to carry data of other kinds – in the most extreme examples it’s possible to transmit complete TCP sessions using only DNS queries and responses. These techniques can be used for everything from obtaining free connectivity from wifi networks, to transmitting trade secrets out of organisations.
Unfortunately the fact that there are vastly more harmless queries than harmful ones means that any automatic classifier – determining whether an individual query is harmful or harmless – has to be implausibly accurate. Even if the classifier only makes mistakes 1% of the time, the harmless records that are misclassified as harmful will still be far more numerous than the genuinely bad ones. We need some other techniques to improve the ratio before passing the result to a human analyst.
Some of the ways of doing this are pretty obvious, others are still research topics. On an internal network there will be a lot of internal queries for internal domains: if one of those has gone bad then the organisation has big problems that it should be detecting it in other ways, so it’s probably OK to discard them from the DNS log analysis (effectively ‘white-listing’ those domains). Equally, if an internal machine requests the IP address of a known botnet command and control server, that should almost certainly be on a blacklist and raise an alarm. However alerting on malformed DNS packets isn’t a good idea: it turns out that many legitimate uses generate packets that don’t conform to the standard. A lot of DNS traffic is generated by processes such as auto-configuration and auto-discovery: that can probably be dropped as non-malicious. The volume of data may now be low enough for statistical or machine learning techniques to be effective – identifying the domain names used by botnet Domain Generation Algorithms (DGAs) has been the topic of many research papers. Only generating alerts to humans when a computer has made tens of suspicious queries rather than on each individual one may bring the number down towards manageable levels. Visualisations such as graphs of DNS queries and clusters of requests suggest there are more patterns in there: the challenge is to work out which of those are significant and how to bring algorithms up to the performance of the human eye in detecting them.
Policy and privacy issues were also highlighted in the talk. DNS queries, much more so than DNS responses, can reveal sensitive information about individuals’ browsing habits, so there need to be clear policies both for organisations and individual researchers and analysts on when to stop (or seek additional approval) before an investigation becomes a threat to privacy. Keeping DNS analysis separate from the DHCP logs that identify individuals as the source of queries is a good step, but policies are still needed to protect against accidental identification or inappropriate use of the data. DNS logs are an important source of information about insecure systems but (as I’ll be presenting later in the conference) we need to be sure those benefits aren’t achieved at the cost of our users’ privacy.