June 3, 2014 By Chris Meenan 4 min read

One of the biggest challenges organizations face today is the need to keep more and more security data online for quick analysis by their SIEM solutions. I’m not talking an extra week or two here, but months or even years. Why? Well, it has been fairly well publicized that an attacker or insider can lurk on a network for months, evading detection and waiting for the most opportune time to attack.

So what is needed when this breach or potential breach is discovered? Quick access to historical data to find out what, if anything, happened and when. In other words, firms need the ability to look back and find those needles in the haystack — and it can be one huge haystack — that can reveal what happened. Due to the nature of breaches today and the sophisticated methods employed to execute them, understanding what is normal in a network is becoming increasingly important as a tool to detect breaches in the first place; ongoing analysis of historical data has a key role to play in that.

Getting the Most Out of Your SIEM

What does this mean for your security intelligence platform? It means that it must not only collect, normalize, categorize and correlate millions of events per second, but it must also have the ability to keep that data online for fast access, potentially for years; this means hundreds of terabytes (TBs) — if not petabytes (PBs) — of data for a lot of organizations. SIEM solutions traditionally don’t quite do this — or if they can, it is very expensive. Nor will they do it the way you really need them to because they were likely not designed to operate in this way.

One option, and one that is quite “in vogue” at the minute, is to utilize a complementary Hadoop-style system, keeping a few weeks’ worth of data in the SIEM and then exporting it into the Hadoop cluster for longer-term storage, analysis and reporting. On the face of it, Hadoop, especially open-source options, can appear pretty competitive from an overall cost perspective, and you can surely deploy Hadoop nodes on commodity hardware, so it’ll be cheap! Furthermore, any half-decent SIEM will have a nice Hadoop-friendly export mechanism, making this integration simple.

Sounds like a great solution doesn’t it? Well, there are a few gotchas.

  • The Hadoop deployment, network infrastructure and the SIEM will have to potentially be able to transfer data at millions of events per second without impacting normal operations. This can be a real challenge, especially if your network is spread across several countries or even continents, connected by easily-flooded WAN links — security data really is big.
  • A lot of the context from the SIEM can be lost. A SIEM will categorize, normalize, tag and link events with paths through the network, vulnerabilities, discovered assets, users and all their properties. It will automatically utilize threat intelligence data as it performs risk and impacting scoring. It will provide right-click, auto-complete, drill-down and pivoting that is specifically designed to make the job of detecting and analyzing a security event as quickly and easily as possible by removing arduous repetitive tasks — a good one will, anyway.
  • Users will want to drill down into this historic data seamlessly as part of their work flow and, conversely, users looking at historic data may want to look at real time data, also, so you will be looking at some level of UI integration and all headaches that can entail.
  • It is yet another security system, adding more administration, integration and maintenance costs. I think most organizations have enough of these and their associated costs, overheads and risks already.

When an organization is large enough to justify all the additional costs associated with running a Hadoop deployment — and has the skills in-house to develop it — it very well might be a viable solution for some specific use cases. In reality, however, what most users need is a security intelligence platform that will continue to provide them the actionable insight they need, even as their data volumes grow.

What about All That Data?

But what about all that data? Don’t organizations need Hadoop to handle TBs and PBs of data? Let’s see.

So you have a good SIEM handling hundreds of thousands or even millions of events per second (potentially distributed over a wide geography); doing all the things that you want it to do in real time; and providing several weeks’ or months’ worth of short-term storage and data access at your fingertips. But say you wanted to keep that data online for a year, or maybe even two. Wouldn’t it be great if all you had to do was add new, very cost-effective “nodes” to your SIEM deployment? You want the nodes to just store data and provide the ability to query it. Wouldn’t it be just fantastic if, when these nodes are added to your deployment, that data is automatically distributed across them? Wouldn’t it be great if you could add these nodes in virtual, software (on commodity hardware) or appliances forms whenever and wherever you needed them? And wouldn’t it be ideal if this was all completely transparent to users?

That is exactly what we in the IBM QRadar product team thought, so we put our heads — and keyboards — together and created the QRadar Data Node. Want to keep your data for longer or just add more querying horsepower? Just add QRadar Data Nodes to your deployment! Addition is simple, it doesn’t require any reconfiguration, your log sources stay the same, your correlation rules don’t change, your reports and integrations are unaltered and the UI doesn’t change. Data is automatically spread across nodes based on intelligent scattering policies. Each Data Node can store and crunch through 100 TB of uncompressed data and up to a whopping 800 TB of compressed data. And the best bit: You can simply keep adding them to your system whenever you need to.

And what is the impact on your users?

Nothing — apart from the fact that they will now have fast access to more historical data than ever before through the same intuitive UI, with all the same context, linkage and intelligent navigation. You will have saved yourself another security system, kept your costs down, lowered your security risks and made your security more effective. We are all very excited about the latest innovative addition to our QRadar family, and we are sure our users will be, too.

More from Intelligence & Analytics

Hive0051’s large scale malicious operations enabled by synchronized multi-channel DNS fluxing

12 min read - For the last year and a half, IBM X-Force has actively monitored the evolution of Hive0051’s malware capabilities. This Russian threat actor has accelerated its development efforts to support expanding operations since the onset of the Ukraine conflict. Recent analysis identified three key changes to capabilities: an improved multi-channel approach to DNS fluxing, obfuscated multi-stage scripts, and the use of fileless PowerShell variants of the Gamma malware. As of October 2023, IBM X-Force has also observed a significant increase in…

Email campaigns leverage updated DBatLoader to deliver RATs, stealers

11 min read - IBM X-Force has identified new capabilities in DBatLoader malware samples delivered in recent email campaigns, signaling a heightened risk of infection from commodity malware families associated with DBatLoader activity. X-Force has observed nearly two dozen email campaigns since late June leveraging the updated DBatLoader loader to deliver payloads such as Remcos, Warzone, Formbook, and AgentTesla. DBatLoader malware has been used since 2020 by cybercriminals to install commodity malware remote access Trojans (RATs) and infostealers, primarily via malicious spam (malspam). DBatLoader…

New Hive0117 phishing campaign imitates conscription summons to deliver DarkWatchman malware

8 min read - IBM X-Force uncovered a new phishing campaign likely conducted by Hive0117 delivering the fileless malware DarkWatchman, directed at individuals associated with major energy, finance, transport, and software security industries based in Russia, Kazakhstan, Latvia, and Estonia. DarkWatchman malware is capable of keylogging, collecting system information, and deploying secondary payloads. Imitating official correspondence from the Russian government in phishing emails aligns with previous Hive0117 campaigns delivering DarkWatchman malware, and shows a possible significant effort to induce a sense of urgency as…

X-Force releases detection & response framework for managed file transfer software

5 min read - How AI can help defenders scale detection guidance for enterprise software tools If we look back at mass exploitation events that shook the security industry like Log4j, Atlassian, and Microsoft Exchange when these solutions were actively being exploited by attackers, the exploits may have been associated with a different CVE, but the detection and response guidance being released by the various security vendors had many similarities (e.g., Log4shell vs. Log4j2 vs. MOVEit vs. Spring4Shell vs. Microsoft Exchange vs. ProxyShell vs.…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today