I have been fascinated by data analytics for all my professional life — from my early days of using Linux command-line tools like grep, cut, sort and unique to make sense of log files and identify the chain of events that harmed my web server, to using simple Excel and pivot tables to do pretty much the same with data of all types. Now, we have much fancier tools like data lakes and data warehouses with powerful query languages, machine learning and statistical analytics tools built into program interfaces, but the basic idea remains the same: to draw valuable insights and inform decision-making.

Through talking to both data analysts and security analysts specifically, I came to realize how similar the two jobs are. For the former, the objective can be rather open-ended — identifying anomalies and presenting statistics in a way that helps humans make sense of large quantities of information. For the latter, the scope is simply narrower with the goal of identifying and predicting threats to security.

Here are some steps that any data analytics initiative, security-focused or otherwise, should progress through in order to create value for the organization.

Identify Target Data

One interesting question that often comes up when discussing security analytics is where security-relevant data actually lives. After many years investigating security events, I am certain that it is the IT operational data — specifically, all the system logs and the indicators of enterprisewide data flows — that is of the greatest concern to security and risk analysts.

To ensure that security is addressing the full scope of the operational reality of the enterprise, any future-ready approach to security analytics must merge, analyze and correlate all relevant, insight-rich data sources.

Develop Robust Data Architecture

It is also important to discuss the question of storage. Having spent a lot of time pondering the issue, I am more and more convinced that there is little to no value in having a “pure” security data lake, one where only security events are stored. The logic behind my thinking is that there are hidden indicators in all sorts of data that can be valuable, and that a robust analytics approach must account for as many of them as possible. What this means is that building a data warehouse that contains data from IT operations, security events and business data is the most beneficial way forward for creating value with security-focused data analytics.

However, the prerequisite to value-creation with data analytics is a sound, outcome-driven architecture, the key considerations of which are outlined below:

Data Ingestion

Aggregating data from every part of the business is a key foundational exercise that will pave the way for the success of any data analytics strategy. Today’s enterprise generates terabytes of data every day with hundreds of thousands of events occurring each second. Data volumes of this magnitude demand a very robust strategy for onboarding and normalizing source data to make it as usable as possible.

Data Pipelining

With stakeholders spread all across the enterprise, it is important to make sure that data flows to the right platforms and devices and is visible to the right people in a timely manner. Privileged access management (PAM) is another key upfront consideration in the design of a data pipeline.

Location of Analytics

To bring value to the business, any data analytics initiative needs to be well-structured. Be sure to consider the merits of in-stream data analytics, analyzing a data package where it is created, versus the traditional method in which data moves to a central analytics platform to be processed. Both have advantages and disadvantages — for example, the high egress cost of moving data to a central platform.

Accompanied by a team of talented individuals, I have been investigating and designing solutions around data analytics for quite some time now, and I know from experience that none of the challenges above are insurmountable. There is always an ability to design bespoke architectures in order to meet enterprise requirements.

With priority placed on understanding business objectives, you can work to align the technicalities of a detailed solution architecture around these three pillars — data ingestion, data pipelining and data analytics — to deliver a solution that is outcome-focused and anchored in value-creation for the enterprise.

Perform Data Analytics

The final and most crucial step on the journey is executing the analytics. For the purpose of simplification, we can split data analytics into two categories:

Data Mining

This is the process of statistical analytics and knowledge discovery by working with the data that is available and making sense of it. Data mining can be used for anomaly detection, as it allows teams to establish a baseline — an understanding of the usual events — so that they can more easily identify outliers in the dataset.

Machine Learning

Machine learning uses the statistical models from data mining and combines them with algorithms in order to automate programmatic task-execution without requiring any explicit instructions. Enterprises should apply such tools with care, as there are some unique pitfalls and vulnerabilities to machine learning, although the massive potential outweighs the risks for any organization committed to strong analytics.

The two categories of data analytics described above still apply to areas of the enterprise other than security — marketing, sales, operations and more can all benefit from their effective application. Similarly, remember that a robust security analytics solution should not examine only security data, but a variety of data types that may contain indicators of threats.

Find a Footing in Strong Architecture

While advanced tools are available and accessible to any department of any organization that wishes to leverage them, the crucial difference between data analytics programs that deliver value and those that don’t is how efficient and business-fit the underlying architecture is. The fundamentals of data ingestion, data storage and data pipelining are the foundations of success in data analytics, whether the goal is to identify security threats, sales leads or operational efficiencies. When we get the basics right, the possibilities for the future are endless.

More from Intelligence & Analytics

Email campaigns leverage updated DBatLoader to deliver RATs, stealers

11 min read - IBM X-Force has identified new capabilities in DBatLoader malware samples delivered in recent email campaigns, signaling a heightened risk of infection from commodity malware families associated with DBatLoader activity. X-Force has observed nearly two dozen email campaigns since late June leveraging the updated DBatLoader loader to deliver payloads such as Remcos, Warzone, Formbook, and AgentTesla. DBatLoader malware has been used since 2020 by cybercriminals to install commodity malware remote access Trojans (RATs) and infostealers, primarily via malicious spam (malspam). DBatLoader…

New Hive0117 phishing campaign imitates conscription summons to deliver DarkWatchman malware

8 min read - IBM X-Force uncovered a new phishing campaign likely conducted by Hive0117 delivering the fileless malware DarkWatchman, directed at individuals associated with major energy, finance, transport, and software security industries based in Russia, Kazakhstan, Latvia, and Estonia. DarkWatchman malware is capable of keylogging, collecting system information, and deploying secondary payloads. Imitating official correspondence from the Russian government in phishing emails aligns with previous Hive0117 campaigns delivering DarkWatchman malware, and shows a possible significant effort to induce a sense of urgency as…

X-Force releases detection & response framework for managed file transfer software

5 min read - How AI can help defenders scale detection guidance for enterprise software tools If we look back at mass exploitation events that shook the security industry like Log4j, Atlassian, and Microsoft Exchange when these solutions were actively being exploited by attackers, the exploits may have been associated with a different CVE, but the detection and response guidance being released by the various security vendors had many similarities (e.g., Log4shell vs. Log4j2 vs. MOVEit vs. Spring4Shell vs. Microsoft Exchange vs. ProxyShell vs.…

Unmasking hypnotized AI: The hidden risks of large language models

11 min read - The emergence of Large Language Models (LLMs) is redefining how cybersecurity teams and cybercriminals operate. As security teams leverage the capabilities of generative AI to bring more simplicity and speed into their operations, it's important we recognize that cybercriminals are seeking the same benefits. LLMs are a new type of attack surface poised to make certain types of attacks easier, more cost-effective, and even more persistent. In a bid to explore security risks posed by these innovations, we attempted to…