I have been fascinated by data analytics for all my professional life — from my early days of using Linux command-line tools like grep, cut, sort and unique to make sense of log files and identify the chain of events that harmed my web server, to using simple Excel and pivot tables to do pretty much the same with data of all types. Now, we have much fancier tools like data lakes and data warehouses with powerful query languages, machine learning and statistical analytics tools built into program interfaces, but the basic idea remains the same: to draw valuable insights and inform decision-making.

Through talking to both data analysts and security analysts specifically, I came to realize how similar the two jobs are. For the former, the objective can be rather open-ended — identifying anomalies and presenting statistics in a way that helps humans make sense of large quantities of information. For the latter, the scope is simply narrower with the goal of identifying and predicting threats to security.

Here are some steps that any data analytics initiative, security-focused or otherwise, should progress through in order to create value for the organization.

Identify Target Data

One interesting question that often comes up when discussing security analytics is where security-relevant data actually lives. After many years investigating security events, I am certain that it is the IT operational data — specifically, all the system logs and the indicators of enterprisewide data flows — that is of the greatest concern to security and risk analysts.

To ensure that security is addressing the full scope of the operational reality of the enterprise, any future-ready approach to security analytics must merge, analyze and correlate all relevant, insight-rich data sources.

Develop Robust Data Architecture

It is also important to discuss the question of storage. Having spent a lot of time pondering the issue, I am more and more convinced that there is little to no value in having a “pure” security data lake, one where only security events are stored. The logic behind my thinking is that there are hidden indicators in all sorts of data that can be valuable, and that a robust analytics approach must account for as many of them as possible. What this means is that building a data warehouse that contains data from IT operations, security events and business data is the most beneficial way forward for creating value with security-focused data analytics.

However, the prerequisite to value-creation with data analytics is a sound, outcome-driven architecture, the key considerations of which are outlined below:

Data Ingestion

Aggregating data from every part of the business is a key foundational exercise that will pave the way for the success of any data analytics strategy. Today’s enterprise generates terabytes of data every day with hundreds of thousands of events occurring each second. Data volumes of this magnitude demand a very robust strategy for onboarding and normalizing source data to make it as usable as possible.

Data Pipelining

With stakeholders spread all across the enterprise, it is important to make sure that data flows to the right platforms and devices and is visible to the right people in a timely manner. Privileged access management (PAM) is another key upfront consideration in the design of a data pipeline.

Location of Analytics

To bring value to the business, any data analytics initiative needs to be well-structured. Be sure to consider the merits of in-stream data analytics, analyzing a data package where it is created, versus the traditional method in which data moves to a central analytics platform to be processed. Both have advantages and disadvantages — for example, the high egress cost of moving data to a central platform.

Accompanied by a team of talented individuals, I have been investigating and designing solutions around data analytics for quite some time now, and I know from experience that none of the challenges above are insurmountable. There is always an ability to design bespoke architectures in order to meet enterprise requirements.

With priority placed on understanding business objectives, you can work to align the technicalities of a detailed solution architecture around these three pillars — data ingestion, data pipelining and data analytics — to deliver a solution that is outcome-focused and anchored in value-creation for the enterprise.

Perform Data Analytics

The final and most crucial step on the journey is executing the analytics. For the purpose of simplification, we can split data analytics into two categories:

Data Mining

This is the process of statistical analytics and knowledge discovery by working with the data that is available and making sense of it. Data mining can be used for anomaly detection, as it allows teams to establish a baseline — an understanding of the usual events — so that they can more easily identify outliers in the dataset.

Machine Learning

Machine learning uses the statistical models from data mining and combines them with algorithms in order to automate programmatic task-execution without requiring any explicit instructions. Enterprises should apply such tools with care, as there are some unique pitfalls and vulnerabilities to machine learning, although the massive potential outweighs the risks for any organization committed to strong analytics.

The two categories of data analytics described above still apply to areas of the enterprise other than security — marketing, sales, operations and more can all benefit from their effective application. Similarly, remember that a robust security analytics solution should not examine only security data, but a variety of data types that may contain indicators of threats.

Find a Footing in Strong Architecture

While advanced tools are available and accessible to any department of any organization that wishes to leverage them, the crucial difference between data analytics programs that deliver value and those that don’t is how efficient and business-fit the underlying architecture is. The fundamentals of data ingestion, data storage and data pipelining are the foundations of success in data analytics, whether the goal is to identify security threats, sales leads or operational efficiencies. When we get the basics right, the possibilities for the future are endless.

More from Intelligence & Analytics

2022 Industry Threat Recap: Manufacturing

It seems like yesterday that industries were fumbling to understand the threats posed by post-pandemic economic and technological changes. While every disruption provides opportunities for positive change, it's hard to ignore the impact that global supply chains, rising labor costs, digital currency and environmental regulations have had on commerce worldwide. Many sectors are starting to see the light at the end of the tunnel. But 2022 has shown us that manufacturing still faces some dark clouds ahead when combatting persistent…

Cybersecurity in the Next-Generation Space Age, Pt. 3: Securing the New Space

View Part 1, Introduction to New Space, and Part 2, Cybersecurity Threats in New Space, in this series. As we see in the previous article of this series discussing the cybersecurity threats in the New Space, space technology is advancing at an unprecedented rate — with new technologies being launched into orbit at an increasingly rapid pace. The need to ensure the security and safety of these technologies has never been more pressing. So, let’s discover a range of measures…

Backdoor Deployment and Ransomware: Top Threats Identified in X-Force Threat Intelligence Index 2023

Deployment of backdoors was the number one action on objective taken by threat actors last year, according to the 2023 IBM Security X-Force Threat Intelligence Index — a comprehensive analysis of our research data collected throughout the year. Backdoor access is now among the hottest commodities on the dark web and can sell for thousands of dollars, compared to credit card data — which can go for as low as $10. On the dark web — a veritable eBay for…

The 13 Costliest Cyberattacks of 2022: Looking Back

2022 has shaped up to be a pricey year for victims of cyberattacks. Cyberattacks continue to target critical infrastructures such as health systems, small government agencies and educational institutions. Ransomware remains a popular attack method for large and small targets alike. While organizations may choose not to disclose the costs associated with a cyberattack, the loss of consumer trust will always be a risk after any significant attack. Let’s look at the 13 costliest cyberattacks of the past year and…