I have been fascinated by data analytics for all my professional life — from my early days of using Linux command-line tools like grep, cut, sort and unique to make sense of log files and identify the chain of events that harmed my web server, to using simple Excel and pivot tables to do pretty much the same with data of all types. Now, we have much fancier tools like data lakes and data warehouses with powerful query languages, machine learning and statistical analytics tools built into program interfaces, but the basic idea remains the same: to draw valuable insights and inform decision-making.

Through talking to both data analysts and security analysts specifically, I came to realize how similar the two jobs are. For the former, the objective can be rather open-ended — identifying anomalies and presenting statistics in a way that helps humans make sense of large quantities of information. For the latter, the scope is simply narrower with the goal of identifying and predicting threats to security.

Here are some steps that any data analytics initiative, security-focused or otherwise, should progress through in order to create value for the organization.

Identify Target Data

One interesting question that often comes up when discussing security analytics is where security-relevant data actually lives. After many years investigating security events, I am certain that it is the IT operational data — specifically, all the system logs and the indicators of enterprisewide data flows — that is of the greatest concern to security and risk analysts.

To ensure that security is addressing the full scope of the operational reality of the enterprise, any future-ready approach to security analytics must merge, analyze and correlate all relevant, insight-rich data sources.

Develop Robust Data Architecture

It is also important to discuss the question of storage. Having spent a lot of time pondering the issue, I am more and more convinced that there is little to no value in having a “pure” security data lake, one where only security events are stored. The logic behind my thinking is that there are hidden indicators in all sorts of data that can be valuable, and that a robust analytics approach must account for as many of them as possible. What this means is that building a data warehouse that contains data from IT operations, security events and business data is the most beneficial way forward for creating value with security-focused data analytics.

However, the prerequisite to value-creation with data analytics is a sound, outcome-driven architecture, the key considerations of which are outlined below:

Data Ingestion

Aggregating data from every part of the business is a key foundational exercise that will pave the way for the success of any data analytics strategy. Today’s enterprise generates terabytes of data every day with hundreds of thousands of events occurring each second. Data volumes of this magnitude demand a very robust strategy for onboarding and normalizing source data to make it as usable as possible.

Data Pipelining

With stakeholders spread all across the enterprise, it is important to make sure that data flows to the right platforms and devices and is visible to the right people in a timely manner. Privileged access management (PAM) is another key upfront consideration in the design of a data pipeline.

Location of Analytics

To bring value to the business, any data analytics initiative needs to be well-structured. Be sure to consider the merits of in-stream data analytics, analyzing a data package where it is created, versus the traditional method in which data moves to a central analytics platform to be processed. Both have advantages and disadvantages — for example, the high egress cost of moving data to a central platform.

Accompanied by a team of talented individuals, I have been investigating and designing solutions around data analytics for quite some time now, and I know from experience that none of the challenges above are insurmountable. There is always an ability to design bespoke architectures in order to meet enterprise requirements.

With priority placed on understanding business objectives, you can work to align the technicalities of a detailed solution architecture around these three pillars — data ingestion, data pipelining and data analytics — to deliver a solution that is outcome-focused and anchored in value-creation for the enterprise.

Perform Data Analytics

The final and most crucial step on the journey is executing the analytics. For the purpose of simplification, we can split data analytics into two categories:

Data Mining

This is the process of statistical analytics and knowledge discovery by working with the data that is available and making sense of it. Data mining can be used for anomaly detection, as it allows teams to establish a baseline — an understanding of the usual events — so that they can more easily identify outliers in the dataset.

Machine Learning

Machine learning uses the statistical models from data mining and combines them with algorithms in order to automate programmatic task-execution without requiring any explicit instructions. Enterprises should apply such tools with care, as there are some unique pitfalls and vulnerabilities to machine learning, although the massive potential outweighs the risks for any organization committed to strong analytics.

The two categories of data analytics described above still apply to areas of the enterprise other than security — marketing, sales, operations and more can all benefit from their effective application. Similarly, remember that a robust security analytics solution should not examine only security data, but a variety of data types that may contain indicators of threats.

Find a Footing in Strong Architecture

While advanced tools are available and accessible to any department of any organization that wishes to leverage them, the crucial difference between data analytics programs that deliver value and those that don’t is how efficient and business-fit the underlying architecture is. The fundamentals of data ingestion, data storage and data pipelining are the foundations of success in data analytics, whether the goal is to identify security threats, sales leads or operational efficiencies. When we get the basics right, the possibilities for the future are endless.

More from Intelligence & Analytics

RansomExx Upgrades to Rust

IBM Security X-Force Threat Researchers have discovered a new variant of the RansomExx ransomware that has been rewritten in the Rust programming language, joining a growing trend of ransomware developers switching to the language. Malware written in Rust often benefits from lower AV detection rates (compared to those written in more common languages) and this may have been the primary reason to use the language. For example, the sample analyzed in this report was not detected as malicious in the…

Moving at the Speed of Business — Challenging Our Assumptions About Cybersecurity

The traditional narrative for cybersecurity has been about limited visibility and operational constraints — not business opportunities. These conversations are grounded in various assumptions, such as limited budgets, scarce resources, skills being at a premium, the attack surface growing, and increased complexity. For years, conventional thinking has been that cybersecurity costs a lot, takes a long time, and is more of a cost center than an enabler of growth. In our upcoming paper, Prosper in the Cyber Economy, published by…

Overcoming Distrust in Information Sharing: What More is There to Do?

As cyber threats increase in frequency and intensity worldwide, it has never been more crucial for governments and private organizations to work together to identify, analyze and combat attacks. Yet while the federal government has strongly supported this model of private-public information sharing, the reality is less than impressive. Many companies feel that intel sharing is too one-sided, as businesses share as much threat intel as governments want but receive very little in return. The question is, have government entities…

Tackling Today’s Attacks and Preparing for Tomorrow’s Threats: A Leader in 2022 Gartner® Magic Quadrant™ for SIEM

Get the latest on IBM Security QRadar SIEM, recognized as a Leader in the 2022 Gartner Magic Quadrant. As I talk to security leaders across the globe, four main themes teams constantly struggle to keep up with are: The ever-evolving and increasing threat landscape Access to and retaining skilled security analysts Learning and managing increasingly complex IT environments and subsequent security tooling The ability to act on the insights from their security tools including security information and event management software…