Why Turning Data Into Security Intelligence Is So Hard
I was hanging out in a local graveyard a few years ago doing math on the ages of the people buried there when it suddenly occurred to me why turning massive volumes of data into security intelligence is so hard. As a longtime, early supporter of security information management, it was frustrating to see the initial promise of advanced security analytics and intelligence becoming, in the mid-aughties, log aggregation with a bit of correlation sprinkled on top. This is not to say that aggregating those logs and doing some correlation of events wasn’t a huge improvement over what we were doing before, just that it wasn’t anywhere near the vision of what true security intelligence could do for us.
The dates on the gravestones gave me a key insight into why. If you clicked the link above, you already know where I live — if you didn’t, it’s a small town in New England. The graveyard has people in it dating back to almost when the town was incorporated in 1760. The data that gave me the “aha” moment was that there were a lot of people in that graveyard who had lived well into their 80s. Given all the media headlines about how our life expectancies are increasing so quickly that we all need to plan for retirement into our 90s or 100s, I was a little surprised by how long the past residents of Amherst, New Hampshire, lived.
Another key data point was the number of people who, sadly, died very young as children or young adults. What I didn’t see was a lot of middle-aged people. So, the “aha” was, “Maybe we’re not living that much longer. Maybe we’re just a lot better at not dying younger.”
Armed with that thought, I went online and did some research. I found research from the Social Security Administration, an organization with a critical business need to understand life expectancy, that backed up my casual observations. While it’s true that advances in health care, nutrition and automobile safety have improved the average person’s chances of living to 90, there is still a “use by” date on the human body. The longevity gains are smaller than some headlines may lead us to believe because those headlines are based on oversimplified math and projection assumptions that don’t tell the whole story.
In very simple terms, if you have 10 people and three of them die before the age of five, two of them die before 30, one dies at a middle age and the other four live to their 80s, you’d get a pretty low average life expectancy. Here’s an example below:
1 + 3 + 4 + 24 + 27 + 54 + 84 + 85 + 87 + 88 = 457
457 ÷ 10 = 45.7 Years Average Life Expectancy
Factor in all the modern advances that help us live past childhood diseases and midlife heart attacks but keep the upper end of the long-lifers the same and you come up with a higher average, like this:
4 + 27 + 79 + 80 + 80 + 81 + 84 + 85 + 87 + 88 = 695
695 ÷ 10 = 69.5 Years Average Life Expectancy
Holy smokes! Imagine the headline: “People Expected to Live 24 Years Longer!” However, no one lived any longer in the example above; they just didn’t die younger.
What Does This Have to Do With Data Security Intelligence?
Quite a bit, actually, because it highlights how easy it is to take data and misinterpret it or draw inaccurate conclusions. Heading out of the graveyard and back to information technology (IT) security and analytics, it’s not hard to find similar analogies. Take a company with a Security Information and Event Management (SIEM) or VM scanner that alerts on a system that has a port 445 for Server Message Block (SMB) open. SMB is used for resource sharing, such as files and printers, and is ripe for exploration. Does this mean the company should shut the port down?
To answer that question, the company needs to put that information in an enriched intelligence context. That’s the hard part — bringing the right data together to be able to make informed risk decisions. Is the system in question in a protected zone or enclave? Is access limited to only a few approved services or do users with legitimate business need to share resources on that system? Does the system in question have a server-based intrusion agent on it that prevents execution of malcode?
Answers about these additional controls will help determine the actual likelihood of an exploit and the impact on a business. However, getting that data requires inputs from more than one data point of the port being open and an ability to put all the data in a business context. If all the controls listed above are in place and the business really needs to share resources on that system, the port shouldn’t be shut down.
If the system was an Internet-facing Web application with no need to have SMB open, it should be shut down. However, the business and IT departments need to know the deeper details to make the right choice.
The example above is very simple, and the decisions and data sets we’re dealing with in large organizations are much more complex, but the concept is the same. The more context and information one can enrich the data set with, the better the intelligence will be. Right now, many companies are still in the “headline” phase of using data, but as we get better at collecting and parsing it, I think we can get to a much more intelligent security analytics model.
Though living to 100 may never be a reality for most of us, there’s no denying that the U.S. population has a much greater chance of reaching 80 than we did in 1760. Security intelligence can help our networks and businesses in the same way, by identifying the critical paths to exploit so we can block them and enhancing the alert and reporting data with business and controls context. The key will be having business-aware analysts who can determine what to look for, which data to parse and how to use tools to sift through and analyze that data. We need to start looking past the security and SIEM “headlines” and dive deeper into the root causes and dependencies.