One of the biggest challenges organizations face today is the need to keep more and more security data online for quick analysis by their SIEM solutions. I’m not talking an extra week or two here, but months or even years. Why? Well, it has been fairly well publicized that an attacker or insider can lurk on a network for months, evading detection and waiting for the most opportune time to attack.
So what is needed when this breach or potential breach is discovered? Quick access to historical data to find out what, if anything, happened and when. In other words, firms need the ability to look back and find those needles in the haystack — and it can be one huge haystack — that can reveal what happened. Due to the nature of breaches today and the sophisticated methods employed to execute them, understanding what is normal in a network is becoming increasingly important as a tool to detect breaches in the first place; ongoing analysis of historical data has a key role to play in that.
Getting the Most Out of Your SIEM
What does this mean for your security intelligence platform? It means that it must not only collect, normalize, categorize and correlate millions of events per second, but it must also have the ability to keep that data online for fast access, potentially for years; this means hundreds of terabytes (TBs) — if not petabytes (PBs) — of data for a lot of organizations. SIEM solutions traditionally don’t quite do this — or if they can, it is very expensive. Nor will they do it the way you really need them to because they were likely not designed to operate in this way.
One option, and one that is quite “in vogue” at the minute, is to utilize a complementary Hadoop-style system, keeping a few weeks’ worth of data in the SIEM and then exporting it into the Hadoop cluster for longer-term storage, analysis and reporting. On the face of it, Hadoop, especially open-source options, can appear pretty competitive from an overall cost perspective, and you can surely deploy Hadoop nodes on commodity hardware, so it’ll be cheap! Furthermore, any half-decent SIEM will have a nice Hadoop-friendly export mechanism, making this integration simple.
Sounds like a great solution doesn’t it? Well, there are a few gotchas.
- The Hadoop deployment, network infrastructure and the SIEM will have to potentially be able to transfer data at millions of events per second without impacting normal operations. This can be a real challenge, especially if your network is spread across several countries or even continents, connected by easily-flooded WAN links — security data really is big.
- A lot of the context from the SIEM can be lost. A SIEM will categorize, normalize, tag and link events with paths through the network, vulnerabilities, discovered assets, users and all their properties. It will automatically utilize threat intelligence data as it performs risk and impacting scoring. It will provide right-click, auto-complete, drill-down and pivoting that is specifically designed to make the job of detecting and analyzing a security event as quickly and easily as possible by removing arduous repetitive tasks — a good one will, anyway.
- Users will want to drill down into this historic data seamlessly as part of their work flow and, conversely, users looking at historic data may want to look at real time data, also, so you will be looking at some level of UI integration and all headaches that can entail.
- It is yet another security system, adding more administration, integration and maintenance costs. I think most organizations have enough of these and their associated costs, overheads and risks already.
When an organization is large enough to justify all the additional costs associated with running a Hadoop deployment — and has the skills in-house to develop it — it very well might be a viable solution for some specific use cases. In reality, however, what most users need is a security intelligence platform that will continue to provide them the actionable insight they need, even as their data volumes grow.
What about All That Data?
But what about all that data? Don’t organizations need Hadoop to handle TBs and PBs of data? Let’s see.
So you have a good SIEM handling hundreds of thousands or even millions of events per second (potentially distributed over a wide geography); doing all the things that you want it to do in real time; and providing several weeks’ or months’ worth of short-term storage and data access at your fingertips. But say you wanted to keep that data online for a year, or maybe even two. Wouldn’t it be great if all you had to do was add new, very cost-effective “nodes” to your SIEM deployment? You want the nodes to just store data and provide the ability to query it. Wouldn’t it be just fantastic if, when these nodes are added to your deployment, that data is automatically distributed across them? Wouldn’t it be great if you could add these nodes in virtual, software (on commodity hardware) or appliances forms whenever and wherever you needed them? And wouldn’t it be ideal if this was all completely transparent to users?
That is exactly what we in the IBM QRadar product team thought, so we put our heads — and keyboards — together and created the QRadar Data Node. Want to keep your data for longer or just add more querying horsepower? Just add QRadar Data Nodes to your deployment! Addition is simple, it doesn’t require any reconfiguration, your log sources stay the same, your correlation rules don’t change, your reports and integrations are unaltered and the UI doesn’t change. Data is automatically spread across nodes based on intelligent scattering policies. Each Data Node can store and crunch through 100 TB of uncompressed data and up to a whopping 800 TB of compressed data. And the best bit: You can simply keep adding them to your system whenever you need to.
And what is the impact on your users?
Nothing — apart from the fact that they will now have fast access to more historical data than ever before through the same intuitive UI, with all the same context, linkage and intelligent navigation. You will have saved yourself another security system, kept your costs down, lowered your security risks and made your security more effective. We are all very excited about the latest innovative addition to our QRadar family, and we are sure our users will be, too.
VP, Product Management, IBM Security