My first reaction to Mehmet Surmeli’s FIRST Conference presentation on Incident Response in the Cloud (video) was “here we go again”. So much seemed awfully familiar from my early days of on-premises incident investigations more than twenty years ago: incomplete logs, tools not designed for security, opaque corners of the target infrastructure, even the dreaded “didn’t we tell you that…?” call from the victim organisation.
But the response and lessons learned were different, and more positive. Maybe the next cloud incident can be different…
It turns out that, although they are often turned off by default, cloud platforms do have logging facilities, and it often requires just a couple of clicks to enable them. Bear in mind, however, that logs kept within the cloud container may be lost when the load scales up or down. Instead it’s better to use the cloud service to build your own (virtual) logging infrastructure, gathering logs from transient virtual machines into a persistent central storage location where you can use cloud facilities to process and explore them. Twenty years ago we knew we ought to have separate infrastructure for gathering, storing and processing logs: cloud systems might actually make that feasible for most organisations to implement.
Keeping incident response within the cloud fits the technical and economic models, too: avoiding limits or costs on exporting large volumes of data and, instead using cloud facilities for their intended purpose of analysing large datasets. As with local incident response, things will be much easier if you prepare tools in advance and use separate accounts and access controls to move data to secure places where intruders can’t follow. As with compromised physical machines, don’t investigate on a system that the badguy can access. Once you’ve established an incident response toolkit on each (major) platform your organisation uses, you can quickly bring new activities within its scope and add new tools as you find them useful. Once you have a working incident response infrastructure and toolkit, consider how you might use cloud tools for real-time monitoring: it should be possible to investigate what intruders are doing as they do it.
Some key principles:
- Get logs out of their default locations: cloud dashboards and tools are not designed for incident response;
- Default logging is not enough: use the cloud to build the logging infrastructure you need;
- Tag and map your assets: don’t make the incident response team reverse engineer what your cloud deployment is supposed to look like;
- Establish incident responder accounts, with sufficient privileges to monitor production systems, but no more.