Thinking about automation: DDoS protection

One of the major causes of disruption on the Internet is Distributed Denial of Service (DDoS) attacks. Unlike “hacking”, these don’t require there to be any security weakness in the target system: they simply aim to overload it with more traffic than it (or its network connection) can handle. Often such attacks are launched from multiple sources at once (hence “distributed”), with many or all of the sources being innocent machines that are being controlled, but not owned, by the attacker.

From the defender’s point of view that creates a new challenge as, in principle at least, the attack packets can be identical to legitimate ones. We could simply block all packets in an over-large flow, but that does the attacker’s work for them. Fortunately there are often patterns that can be used to (mostly) distinguish the malicious packets from the genuine ones. These are commonly identified, and sometimes implemented, using automated systems.

Applying my generic model of a security automat, here are some thoughts…

Levers: DDoS protection systems typically consist of two layers. The first selects a portion of the packets on the network, based on header characteristics (source address, port/service, etc.) and re-routes these to an inspection system. Here the re-routed packets are examined more closely: those that appear harmless are routed on to their original destination, the rest are judged to be part of the attack and typically dropped. Thus the outcomes may be:

Packet follows normal route to destination
Packet is (slightly) delayed through re-routing, but identified as harmless and passed to destination following inspection
Packet is dropped after failing both stages of check.

It is worth noting that without DDoS protection an attack would likely cause significant packet loss for the target site, and possibly the network, in any case. So a DDoS protection system doesn’t have to perfectly classify every packet: a sizeable reduction in bad traffic is what we are looking for. If a few good packets get blocked, they may well be re-transmitted by their (legitimate) origins anyway: if a few bad ones get through, the target system should be able to deal with those without overload.

Data: The first stage check is likely to use only packet header data; its re-routing algorithm may also take account of current context, e.g. any recent unusual flows to/from the same destination. Second stage check may inspect any accessible portion of the packet, including unencrypted content. It seems unlikely that either decryption or flow re-assembly will be worth the required processing cycles in a situation where the aim is to “make things less bad”, rather than “achieve perfection”.

Malice: A malicious actor may try to persuade the automat that a DDoS attack is in progress, to cause it to block either a particular source or destination, or a particular application. These outcomes could, of course, be generated by actually creating a DDoS attack, but deceiving the automat into mis-applying its levers is likely to be cheaper than renting a “stresser” service, and may be harder to trace to its origin.

Controls: The human operator of the DDoS service may wish to intervene at two different levels:

Correct the automat’s mis-identification of a flow as being (part of) a DDoS attack. Particularly on research networks, large unexpected flows may be the successful outcome of new systems or experiments (for example novel file-synchronisation protocols have triggered many alarms in the past). In this case the operator is likely to want to withdraw the automat’s proposed intervention entirely. Or
Refine the automat’s classification of a DDoS flow, either by amending the rules for identifying packets for re-routing or the rules for analysing and disposing of them.

Depending on context, each of these options may be required both before a new rule is introduced (human approval of proposed blocking) and/or afterwards (human review). Some operators may also wish to pro-actively list some flows as exempt (typically identified by source/destination/port) from redirection or blocking if their nature (e.g. DNS responses) means that any interruption by the “protection” system would effectively deny service to the receiving site anyway.

Signals: The operator is likely to want to know when a new rule(set) is proposed or introduced, either to approve it before implementation, or to review it afterwards. Statistical information may also be required about the current status and activity of the protection system (e.g. how many/which destinations are under attack, what proportion of traffic is being forwarded after cleaning, etc.). Since attacks and campaigns are typically short-lived, the operator may want to know which rules are no longer matching traffic, so they can be disabled to save space, network and processing capacity.

Historic information might also be useful to assess effectiveness: how much has the addition of the rule changed the traffic being delivered compared to what it looked like before the attack? Perfection would be “no change at all”, if there is a significant difference this might be an indication that the rules need reviewing, or that the attack is having an impact on systems or networks elsewhere.

By Andrew Cormack

Leave a Reply Cancel reply