TMI: The Problem of Too Much Information in Data Security

I am going to describe innovative technology based on machine learning, called “outlier detection,” in the context of data security. But first, let’s review why it’s so important for data breach detection to evolve beyond traditional, policy-based monitoring and auditing solutions.

The highly publicized data breaches of the past few years have organizations scrambling to build up their IT security infrastructures. This means they must establish an in-depth defense strategy to protect the organization from the outside in, from the perimeter through the internal network, hosts, applications and — the final and, arguably, most important layer — the data itself. Each of these security layers may generate alerts, write log records or in some way vie for the attention of security analysts.

The infamous 2013 data breach at Target revealed that security alarms raised by its monitoring software were often ignored or at least deemed unworthy of further investigation. This should not come as a surprise. Security analysts are bombarded with false positives, and without any indication of relative risk, there is no way to prioritize the analysis.

Assuming cybercriminals get through these outer layers of detection, having a layer of protection around the data itself can certainly help. White-list data security policies can alert organizations of unauthorized connections, and users and can even block those connections. However, blocking can be risky: A misconfigured blocking policy could prevent customers from completing a Web purchase or disable a chief financial officer’s access to mandated financial reporting data. So, by relying strictly on an alert every time database users forget their passwords, for example, we have simply compounded the situation of alert overload.

Even more problematic from a detection perspective is the worst-case scenario in which administration credentials are stolen, allowing data theft to occur under the cloak of normalcy. Similarly, SQL injection attacks that operate under application user privileges may also be able to access sensitive data without immediately triggering alarms.

As more database activity is generated from cloud applications, big data analysis and the Internet of Things, it’s clear that relying solely on alerting and audit log analysis alone will not scale appropriately. Organizations need to augment existing data security mechanisms with capabilities that provide data security analysts with more context and intelligence based on anomalies.

Help Is on the Way: Use Data-Mining Techniques on Data Access for Outlier Detection

In collaboration with IBM Research, Guardium’s data activity-monitoring platform can now take advantage of data-mining techniques to extend traditional database monitoring with increased intelligence to help security analysts understand risk based on relative changes in behavior. For example, if Joe the DBA is observed accessing a particular table many more times than he has in the past, it could be that he is slowly downloading small amounts of data over time. If an application generates more SQL errors than it has in the past, a SQL injection attack may be under way.

The advanced machine learning algorithms are adaptive, dynamically learn the normal patterns of a user’s activities and analyze new activities as they accumulate.


Being able to visualize the outliers on a dashboard helps analysts easily see where they can focus their efforts. As shown below and in this technical article, drilling down on an outlier indicator (yellow or red) provides the reason for the anomaly, such as unusually high volume or a rare event. You can automatically drill into detailed audit events in that specific time frame with context-aware forensic capabilities. Analysts can also specify which events to ignore, thus reducing false positives.

The outliers caught in the net of this mining technology can help analysts detect and react much more quickly to some hard-to-detect data breach scenarios, such as the following:

  • A disgruntled database administrator decides to extract an entire contact list into a Comma-Separated Values format to take with him. The algorithm can detect that the volume is exceptional because this user/role usually does administration work and shouldn’t access operational data. Even if the database administrator writes a program that does this one by one, the frequency of activity would be flagged, as well.
  • A developer adds buggy code into database-stored procedures that will “blow up” after he is gone to prove how much he was needed. Again, the algorithm will detect this abnormal behavior because someone changed a stored procedure that is normally locked down and untouched. Additionally, a higher number of errors caused by the buggy procedure will also show up as an anomalous event.

These are hypothetical use cases, but let’s take a look at some things that were uncovered in a real company environment:

Case Study Results: Divorce and Data Privacy in a Large Telecom

IBM worked with a large European telecom to develop and refine its algorithms, using real but masked audit data to do its analysis. IBM discovered one side of a pending divorce case was obtaining highly sensitive information about the other side, including when that person initiated a specific conversation, who was talking with him or her on the phone and where he or she was during the conversation.

It turned out that a “friend” inside the telecom was accessing insider information for one party to enhance that side of the case. A combination of sensitive table auditing and algorithms from outlier detection machines enabled the company to determine the offending users. Auditing alone makes such detection more difficult since the access by this user was considered normal for that person’s job.

This company also discovered a vulnerability in its customer relationship management (CRM) system. A code change caused the wireless antenna number to be exposed in its CRM system views, exposing sensitive location data to the help desk. It was through outliers that the company identified that sensitive data was exposed to a database user who was not normally accessing such information.

The Battle Over the Crown Jewels Continues to Escalate

It’s time to face the facts. Cybercriminals have stepped up their game to be even more crafty and take advantage of the fact that many organizations are understaffed and underprotected. We all have to step up our game to help prevent attacks and/or limit the amount of damage they can inflict. Data mining can help organizations focus and prioritize resources to protect their crown jewels.

As it works with more customers, IBM continues to make advances in the algorithms and the deployment capabilities of outlier detection, including the ability to scale across many audit data collection nodes. There will be more options for improving the outlier detection algorithm, such as by marking known attacks to raise their anomaly scores. IBM is also working to provide the ability to correlate more outlier data with other existing information that is known about users and to provide more visualization techniques to ease such investigations.


More from Data Protection

Overheard at RSA Conference 2024: Top trends cybersecurity experts are talking about

4 min read - At a brunch roundtable, one of the many informal events held during the RSA Conference 2024 (RSAC), the conversation turned to the most popular trends and themes at this year’s events. There was no disagreement in what people presenting sessions or companies on the Expo show floor were talking about: RSAC 2024 is all about artificial intelligence (or as one CISO said, “It’s not RSAC; it’s RSAI”). The chatter around AI shouldn’t have been a surprise to anyone who attended…

3 Strategies to overcome data security challenges in 2024

3 min read - There are over 17 billion internet-connected devices in the world — and experts expect that number will surge to almost 30 billion by 2030.This rapidly growing digital ecosystem makes it increasingly challenging to protect people’s privacy. Attackers only need to be right once to seize databases of personally identifiable information (PII), including payment card information, addresses, phone numbers and Social Security numbers.In addition to the ever-present cybersecurity threats, data security teams must consider the growing list of data compliance laws…

How data residency impacts security and compliance

3 min read - Every piece of your organization’s data is stored in a physical location. Even data stored in a cloud environment lives in a physical location on the virtual server. However, the data may not be in the location you expect, especially if your company uses multiple cloud providers. The data you are trying to protect may be stored literally across the world from where you sit right now or even in multiple locations at the same time. And if you don’t…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today