I have been fascinated by data analytics for all my professional life — from my early days of using Linux command-line tools like grep, cut, sort and unique to make sense of log files and identify the chain of events that harmed my web server, to using simple Excel and pivot tables to do pretty much the same with data of all types. Now, we have much fancier tools like data lakes and data warehouses with powerful query languages, machine learning and statistical analytics tools built into program interfaces, but the basic idea remains the same: to draw valuable insights and inform decision-making.
Through talking to both data analysts and security analysts specifically, I came to realize how similar the two jobs are. For the former, the objective can be rather open-ended — identifying anomalies and presenting statistics in a way that helps humans make sense of large quantities of information. For the latter, the scope is simply narrower with the goal of identifying and predicting threats to security.
Here are some steps that any data analytics initiative, security-focused or otherwise, should progress through in order to create value for the organization.
Identify Target Data
One interesting question that often comes up when discussing security analytics is where security-relevant data actually lives. After many years investigating security events, I am certain that it is the IT operational data — specifically, all the system logs and the indicators of enterprisewide data flows — that is of the greatest concern to security and risk analysts.
To ensure that security is addressing the full scope of the operational reality of the enterprise, any future-ready approach to security analytics must merge, analyze and correlate all relevant, insight-rich data sources.
Develop Robust Data Architecture
It is also important to discuss the question of storage. Having spent a lot of time pondering the issue, I am more and more convinced that there is little to no value in having a “pure” security data lake, one where only security events are stored. The logic behind my thinking is that there are hidden indicators in all sorts of data that can be valuable, and that a robust analytics approach must account for as many of them as possible. What this means is that building a data warehouse that contains data from IT operations, security events and business data is the most beneficial way forward for creating value with security-focused data analytics.
However, the prerequisite to value-creation with data analytics is a sound, outcome-driven architecture, the key considerations of which are outlined below:
Aggregating data from every part of the business is a key foundational exercise that will pave the way for the success of any data analytics strategy. Today’s enterprise generates terabytes of data every day with hundreds of thousands of events occurring each second. Data volumes of this magnitude demand a very robust strategy for onboarding and normalizing source data to make it as usable as possible.
With stakeholders spread all across the enterprise, it is important to make sure that data flows to the right platforms and devices and is visible to the right people in a timely manner. Privileged access management (PAM) is another key upfront consideration in the design of a data pipeline.
Location of Analytics
To bring value to the business, any data analytics initiative needs to be well-structured. Be sure to consider the merits of in-stream data analytics, analyzing a data package where it is created, versus the traditional method in which data moves to a central analytics platform to be processed. Both have advantages and disadvantages — for example, the high egress cost of moving data to a central platform.
Accompanied by a team of talented individuals, I have been investigating and designing solutions around data analytics for quite some time now, and I know from experience that none of the challenges above are insurmountable. There is always an ability to design bespoke architectures in order to meet enterprise requirements.
With priority placed on understanding business objectives, you can work to align the technicalities of a detailed solution architecture around these three pillars — data ingestion, data pipelining and data analytics — to deliver a solution that is outcome-focused and anchored in value-creation for the enterprise.
Perform Data Analytics
The final and most crucial step on the journey is executing the analytics. For the purpose of simplification, we can split data analytics into two categories:
This is the process of statistical analytics and knowledge discovery by working with the data that is available and making sense of it. Data mining can be used for anomaly detection, as it allows teams to establish a baseline — an understanding of the usual events — so that they can more easily identify outliers in the dataset.
Machine learning uses the statistical models from data mining and combines them with algorithms in order to automate programmatic task-execution without requiring any explicit instructions. Enterprises should apply such tools with care, as there are some unique pitfalls and vulnerabilities to machine learning, although the massive potential outweighs the risks for any organization committed to strong analytics.
The two categories of data analytics described above still apply to areas of the enterprise other than security — marketing, sales, operations and more can all benefit from their effective application. Similarly, remember that a robust security analytics solution should not examine only security data, but a variety of data types that may contain indicators of threats.
Find a Footing in Strong Architecture
While advanced tools are available and accessible to any department of any organization that wishes to leverage them, the crucial difference between data analytics programs that deliver value and those that don’t is how efficient and business-fit the underlying architecture is. The fundamentals of data ingestion, data storage and data pipelining are the foundations of success in data analytics, whether the goal is to identify security threats, sales leads or operational efficiencies. When we get the basics right, the possibilities for the future are endless.