TMI: The Problem of Too Much Information in Data Security

I am going to describe innovative technology based on machine learning, called “outlier detection,” in the context of data security. But first, let’s review why it’s so important for data breach detection to evolve beyond traditional, policy-based monitoring and auditing solutions.

The highly publicized data breaches of the past few years have organizations scrambling to build up their IT security infrastructures. This means they must establish an in-depth defense strategy to protect the organization from the outside in, from the perimeter through the internal network, hosts, applications and — the final and, arguably, most important layer — the data itself. Each of these security layers may generate alerts, write log records or in some way vie for the attention of security analysts.

The infamous 2013 data breach at Target revealed that security alarms raised by its monitoring software were often ignored or at least deemed unworthy of further investigation. This should not come as a surprise. Security analysts are bombarded with false positives, and without any indication of relative risk, there is no way to prioritize the analysis.

Assuming cybercriminals get through these outer layers of detection, having a layer of protection around the data itself can certainly help. White-list data security policies can alert organizations of unauthorized connections, and users and can even block those connections. However, blocking can be risky: A misconfigured blocking policy could prevent customers from completing a Web purchase or disable a chief financial officer’s access to mandated financial reporting data. So, by relying strictly on an alert every time database users forget their passwords, for example, we have simply compounded the situation of alert overload.

Even more problematic from a detection perspective is the worst-case scenario in which administration credentials are stolen, allowing data theft to occur under the cloak of normalcy. Similarly, SQL injection attacks that operate under application user privileges may also be able to access sensitive data without immediately triggering alarms.

As more database activity is generated from cloud applications, big data analysis and the Internet of Things, it’s clear that relying solely on alerting and audit log analysis alone will not scale appropriately. Organizations need to augment existing data security mechanisms with capabilities that provide data security analysts with more context and intelligence based on anomalies.

Help Is on the Way: Use Data-Mining Techniques on Data Access for Outlier Detection

In collaboration with IBM Research, Guardium’s data activity-monitoring platform can now take advantage of data-mining techniques to extend traditional database monitoring with increased intelligence to help security analysts understand risk based on relative changes in behavior. For example, if Joe the DBA is observed accessing a particular table many more times than he has in the past, it could be that he is slowly downloading small amounts of data over time. If an application generates more SQL errors than it has in the past, a SQL injection attack may be under way.

The advanced machine learning algorithms are adaptive, dynamically learn the normal patterns of a user’s activities and analyze new activities as they accumulate.


Being able to visualize the outliers on a dashboard helps analysts easily see where they can focus their efforts. As shown below and in this technical article, drilling down on an outlier indicator (yellow or red) provides the reason for the anomaly, such as unusually high volume or a rare event. You can automatically drill into detailed audit events in that specific time frame with context-aware forensic capabilities. Analysts can also specify which events to ignore, thus reducing false positives.

The outliers caught in the net of this mining technology can help analysts detect and react much more quickly to some hard-to-detect data breach scenarios, such as the following:

  • A disgruntled database administrator decides to extract an entire contact list into a Comma-Separated Values format to take with him. The algorithm can detect that the volume is exceptional because this user/role usually does administration work and shouldn’t access operational data. Even if the database administrator writes a program that does this one by one, the frequency of activity would be flagged, as well.
  • A developer adds buggy code into database-stored procedures that will “blow up” after he is gone to prove how much he was needed. Again, the algorithm will detect this abnormal behavior because someone changed a stored procedure that is normally locked down and untouched. Additionally, a higher number of errors caused by the buggy procedure will also show up as an anomalous event.

These are hypothetical use cases, but let’s take a look at some things that were uncovered in a real company environment:

Case Study Results: Divorce and Data Privacy in a Large Telecom

IBM worked with a large European telecom to develop and refine its algorithms, using real but masked audit data to do its analysis. IBM discovered one side of a pending divorce case was obtaining highly sensitive information about the other side, including when that person initiated a specific conversation, who was talking with him or her on the phone and where he or she was during the conversation.

It turned out that a “friend” inside the telecom was accessing insider information for one party to enhance that side of the case. A combination of sensitive table auditing and algorithms from outlier detection machines enabled the company to determine the offending users. Auditing alone makes such detection more difficult since the access by this user was considered normal for that person’s job.

This company also discovered a vulnerability in its customer relationship management (CRM) system. A code change caused the wireless antenna number to be exposed in its CRM system views, exposing sensitive location data to the help desk. It was through outliers that the company identified that sensitive data was exposed to a database user who was not normally accessing such information.

The Battle Over the Crown Jewels Continues to Escalate

It’s time to face the facts. Cybercriminals have stepped up their game to be even more crafty and take advantage of the fact that many organizations are understaffed and underprotected. We all have to step up our game to help prevent attacks and/or limit the amount of damage they can inflict. Data mining can help organizations focus and prioritize resources to protect their crown jewels.

As it works with more customers, IBM continues to make advances in the algorithms and the deployment capabilities of outlier detection, including the ability to scale across many audit data collection nodes. There will be more options for improving the outlier detection algorithm, such as by marking known attacks to raise their anomaly scores. IBM is also working to provide the ability to correlate more outlier data with other existing information that is known about users and to provide more visualization techniques to ease such investigations.


More from Intelligence & Analytics

The 13 Costliest Cyberattacks of 2022: Looking Back

2022 has shaped up to be a pricey year for victims of cyberattacks. Cyberattacks continue to target critical infrastructures such as health systems, small government agencies and educational institutions. Ransomware remains a popular attack method for large and small targets alike. While organizations may choose not to disclose the costs associated with a cyberattack, the loss of consumer trust will always be a risk after any significant attack. Let’s look at the 13 costliest cyberattacks of the past year and…

What Can We Learn From Recent Cyber History?

The Center for Strategic and International Studies compiled a list of significant cyber incidents dating back to 2003. Compiling attacks on government agencies, defense and high-tech companies or economic crimes with losses of more than a million dollars, this list reveals broader trends in cybersecurity for the past two decades. And, of course, there are the headline breaches and supply chain attacks to consider. Over recent years, what lessons can we learn from our recent history — and what projections…

When Logs Are Out, Enhanced Analytics Stay In

I was talking to an analyst firm the other day. They told me that a lot of organizations purchase a security information and event management (SIEM) solution and then “place it on the shelf.” “Why would they do that?” I asked. I spent the majority of my career in hardware — enterprise hardware, cloud hardware, and just recently made the jump to security software, hence my question. “Because SIEMs are hard to use. A SIEM purchase is just a checked…

4 Most Common Cyberattack Patterns from 2022

As 2022 comes to an end, cybersecurity teams globally are taking the opportunity to reflect on the past 12 months and draw whatever conclusions and insights they can about the threat landscape. It has been a challenging year for security teams. A major conflict in Europe, a persistently remote workforce and a series of large-scale cyberattacks have all but guaranteed that 2022 was far from uneventful. In this article, we’ll round up some of the most common cyberattack patterns we…