In my continuing series of keynote recaps, I’d like to discuss two talks that center on using big data analytics to offset the uncertainty of advanced cyber attacks. The first is the keynote from the fourth annual Hack in the Box (HITB) security conference in Amsterdam by Eddie Schwartz (@eddieschwartz), who also coauthored the related paper Big Data Fuels Intelligence-Driven Security. The second is an RSA 2014 talk, Securing the Big Data Ecosystem by Davi Ottenheimer (@daviottenheimer).
The Case for Big Data
Schwartz begins his keynote by referencing the movie and book Moneyball, using it to illustrate that fighting against advanced threats means we need to be looking at nontraditional metrics and data sets. He raises an interesting point that, traditionally, we have been spending a lot of time and financial resources on perimeter defense, but there is a lot of opportunity to collect data and react once the attacker has breached initial defenses and is busy in lateral movement, escalation or exfiltration. We need to be thinking about what sort of queries we can make to identify these activities.
One of the challenges we have in identifying threats is that it is not about spotting an obvious looking problem; advanced threats are hard to identify and will blend well into the background. Big data analytics can help us with that. Using big data, we can start to see what is deviating from the norm. These problems can manifest themselves in a number of ways, including across domains, IP origins, checkout times, website browsing patterns, pipe names (for something at an OS level) and more.
As we move into advanced threat analysis using big data, we are not only trying to recognize good versus bad, but we are examining the critical versus noncritical; what needs to be acted upon now versus what can wait. If we associate a dollar value to risk, what will we lose if we don’t act immediately?
Challenges in Using Big Data
Both talks mention several challenges with using big data to predict or identify security threats. They include the following:
- Acquisition of skilled staff to deploy, analyze and grow the big data environment, or partnering with those who have it
- Comprehensive data collection and visibility
- Normalization of data coming form different sources, platforms, geos, etc.
- Premining
- Use of new parsers to analyze the new types of data being collected
Another challenge that Ottenheimer identifies is the use of controls in big data environments and securing big data itself. This needs a whole new approach as we work on continuous collection of massive data sets, sharing them within and outside organizations, and across international boundaries. Not only do we have to care about availability of data, but also about securing the data itself.
What About the Traditional Checklist Model
Schwartz mentions that although the traditional checklist model — white/black, good/bad — gives us certainty, we can’t ignore the uncertainty. We can use the collection, processing and querying of big data to deal with that uncertainty. At the same time, this does not mean that we go to the other extreme and throw out the traditional model completely. Ottenheimer gives the analogy of fighter pilots using checklists during takeoffs. They have a lot of value, but when an emergency situation arises, or they are engaged in fight, then more intelligent methods are needed. Intelligence driven security is the model that allows you to look at a huge collection of data in innovative and intelligent ways to identify misbehaving assets as soon as possible.
Conclusion
One of the points that Schwartz mentioned is to start moving from big data knowledge to big data deployment. He also talks about outsourcing and partnering for some of these roles, and says that companies should start asking partners and vendors about how they are adopting nontraditional analytics.
X-Force Security Researcher, IBM Security