TMI: The Problem of Too Much Information in Data Security

I am going to describe innovative technology based on machine learning, called “outlier detection,” in the context of data security. But first, let’s review why it’s so important for data breach detection to evolve beyond traditional, policy-based monitoring and auditing solutions.

The highly publicized data breaches of the past few years have organizations scrambling to build up their IT security infrastructures. This means they must establish an in-depth defense strategy to protect the organization from the outside in, from the perimeter through the internal network, hosts, applications and — the final and, arguably, most important layer — the data itself. Each of these security layers may generate alerts, write log records or in some way vie for the attention of security analysts.

The infamous 2013 data breach at Target revealed that security alarms raised by its monitoring software were often ignored or at least deemed unworthy of further investigation. This should not come as a surprise. Security analysts are bombarded with false positives, and without any indication of relative risk, there is no way to prioritize the analysis.

Assuming cybercriminals get through these outer layers of detection, having a layer of protection around the data itself can certainly help. White-list data security policies can alert organizations of unauthorized connections, and users and can even block those connections. However, blocking can be risky: A misconfigured blocking policy could prevent customers from completing a Web purchase or disable a chief financial officer’s access to mandated financial reporting data. So, by relying strictly on an alert every time database users forget their passwords, for example, we have simply compounded the situation of alert overload.

Even more problematic from a detection perspective is the worst-case scenario in which administration credentials are stolen, allowing data theft to occur under the cloak of normalcy. Similarly, SQL injection attacks that operate under application user privileges may also be able to access sensitive data without immediately triggering alarms.

As more database activity is generated from cloud applications, big data analysis and the Internet of Things, it’s clear that relying solely on alerting and audit log analysis alone will not scale appropriately. Organizations need to augment existing data security mechanisms with capabilities that provide data security analysts with more context and intelligence based on anomalies.

Help Is on the Way: Use Data-Mining Techniques on Data Access for Outlier Detection

In collaboration with IBM Research, Guardium’s data activity-monitoring platform can now take advantage of data-mining techniques to extend traditional database monitoring with increased intelligence to help security analysts understand risk based on relative changes in behavior. For example, if Joe the DBA is observed accessing a particular table many more times than he has in the past, it could be that he is slowly downloading small amounts of data over time. If an application generates more SQL errors than it has in the past, a SQL injection attack may be under way.

The advanced machine learning algorithms are adaptive, dynamically learn the normal patterns of a user’s activities and analyze new activities as they accumulate.


Being able to visualize the outliers on a dashboard helps analysts easily see where they can focus their efforts. As shown below and in this technical article, drilling down on an outlier indicator (yellow or red) provides the reason for the anomaly, such as unusually high volume or a rare event. You can automatically drill into detailed audit events in that specific time frame with context-aware forensic capabilities. Analysts can also specify which events to ignore, thus reducing false positives.

The outliers caught in the net of this mining technology can help analysts detect and react much more quickly to some hard-to-detect data breach scenarios, such as the following:

  • A disgruntled database administrator decides to extract an entire contact list into a Comma-Separated Values format to take with him. The algorithm can detect that the volume is exceptional because this user/role usually does administration work and shouldn’t access operational data. Even if the database administrator writes a program that does this one by one, the frequency of activity would be flagged, as well.
  • A developer adds buggy code into database-stored procedures that will “blow up” after he is gone to prove how much he was needed. Again, the algorithm will detect this abnormal behavior because someone changed a stored procedure that is normally locked down and untouched. Additionally, a higher number of errors caused by the buggy procedure will also show up as an anomalous event.

These are hypothetical use cases, but let’s take a look at some things that were uncovered in a real company environment:

Case Study Results: Divorce and Data Privacy in a Large Telecom

IBM worked with a large European telecom to develop and refine its algorithms, using real but masked audit data to do its analysis. IBM discovered one side of a pending divorce case was obtaining highly sensitive information about the other side, including when that person initiated a specific conversation, who was talking with him or her on the phone and where he or she was during the conversation.

It turned out that a “friend” inside the telecom was accessing insider information for one party to enhance that side of the case. A combination of sensitive table auditing and algorithms from outlier detection machines enabled the company to determine the offending users. Auditing alone makes such detection more difficult since the access by this user was considered normal for that person’s job.

This company also discovered a vulnerability in its customer relationship management (CRM) system. A code change caused the wireless antenna number to be exposed in its CRM system views, exposing sensitive location data to the help desk. It was through outliers that the company identified that sensitive data was exposed to a database user who was not normally accessing such information.

The Battle Over the Crown Jewels Continues to Escalate

It’s time to face the facts. Cybercriminals have stepped up their game to be even more crafty and take advantage of the fact that many organizations are understaffed and underprotected. We all have to step up our game to help prevent attacks and/or limit the amount of damage they can inflict. Data mining can help organizations focus and prioritize resources to protect their crown jewels.

As it works with more customers, IBM continues to make advances in the algorithms and the deployment capabilities of outlier detection, including the ability to scale across many audit data collection nodes. There will be more options for improving the outlier detection algorithm, such as by marking known attacks to raise their anomaly scores. IBM is also working to provide the ability to correlate more outlier data with other existing information that is known about users and to provide more visualization techniques to ease such investigations.


More from Intelligence & Analytics

BlackCat (ALPHV) Ransomware Levels Up for Stealth, Speed and Exfiltration

9 min read - This blog was made possible through contributions from Kat Metrick, Kevin Henson, Agnes Ramos-Beauchamp, Thanassis Diogos, Diego Matos Martins and Joseph Spero. BlackCat ransomware, which was among the top ransomware families observed by IBM Security X-Force in 2022, according to the 2023 X-Force Threat Intelligence Index, continues to wreak havoc across organizations globally this year. BlackCat (a.k.a. ALPHV) ransomware affiliates' more recent attacks include targeting organizations in the healthcare, government, education, manufacturing and hospitality sectors. Reportedly, several of these incidents resulted…

9 min read

Despite Tech Layoffs, Cybersecurity Positions are Hiring

4 min read - It’s easy to read today’s headlines and think that now isn’t the best time to look for a job in the tech industry. However, that’s not necessarily true. When you read deeper into the stories and numbers, cybersecurity positions are still very much in demand. Cybersecurity professionals are landing jobs every day, and IT professionals from other roles may be able to transfer their skills into cybersecurity relatively easily. As cybersecurity continues to remain a top business priority, organizations will…

4 min read

79% of Cyber Pros Make Decisions Without Threat Intelligence

4 min read - In a recent report, 79% of security pros say they make decisions without adversary insights “at least the majority of the time.” Why aren’t companies effectively leveraging threat intelligence? And does the C-Suite know this is going on? It’s not unusual for attackers to stay concealed within an organization’s computer systems for extended periods of time. And if their methods and behavioral patterns are unfamiliar, they can cause significant harm before the security team even realizes a breach has occurred.…

4 min read

Why People Skills Matter as Much as Industry Experience

4 min read - As the project manager at a large tech company, I always went to Jim when I needed help. While others on my team had more technical expertise, Jim was easy to work with. He explained technical concepts in a way anyone could understand and patiently answered my seemingly endless questions. We spent many hours collaborating and brainstorming ideas about product features as well as new processes for the team. But Jim was especially valuable when I needed help with other…

4 min read