July 11, 2016 By John Haldeman 5 min read

Databases are busy, budgets are tight and time is limited. As such, if you are just starting to deploy database activity monitoring (DAM) systems in your organization or have been monitoring databases for a while, you may have been told that it’s impractical to log everything. If this is the case, then what do you filter out?

It is a bad idea to attempt to log everything because everything logged has a cost. The costs are sometimes obvious: It requires additional infrastructure (e.g., disk space, CPU resources, etc.) to evaluate and write logging records in your environment. Sometimes the costs are more difficult to measure but are still very real — for example, handling more data means more administrative and analysis time from the people managing your DAM systems.

In attempting to log everything, everyone from the DAM system administrators to the security data reviewers may lose focus, exposing you to risk as they try to manage the environment and the volume of data instead of making use of the data provided by the system. In other words, logging everything is not only expensive, but it’s bad for security when you take into account that there are limited resources available to those responding to security events.

It should be noted that this is true even when using analytic tools available in database activity monitoring systems. Databases are very busy and erratic; reducing thousands of events by a factor of 10 or even 20 still leaves you with hundreds to thousands of outliers to process each day. Filtering allows your systems run smoother and makes outlier detection even better.

Determining How to Filter Your Database Activity

So you have to filter out some data. How you define the filtering criteria comes down to two major factors:

  1. White-listing versus blacklisting; and
  2. Specificity.

The white list/blacklist decision relates to how you define the lists of things you want to track. In a white list, you explicitly define those things you do not wish to see. In a blacklist, you define what you want to see, implicitly filtering out everything else.

Say you are primarily concerned about administrative database users. In a white list, you would define a list of all accounts that are not administrative users so they can be filtered out. In a blacklist, you would instead define a list of all administrative database users to include. Quite often white lists are populated based on observed activity such as an audit history or the ongoing activity in the environment. Blacklists are often populated based on information either received from someone else or from another system.

Whether you use a blacklist or white list depends on two things:

  1. How complete your knowledge of the required filtering criteria is; and
  2. How often the items in the list change and the mechanisms available to be made aware of the change.

Back to our example, if you have no idea what the list of administrative database users is, then a white list would be more appropriate. In reviewing the captured data, you would build on the knowledge of administrative and nonadministrative users and could update the white list accordingly.

If you instead have access to a complete list of administrative users — hopefully from a source other than the administrative users you are trying to track — you can leverage that. A common place to retrieve this list might be a corporate directory or from the database catalog itself.

Be wary: The data collected even from systems such as a corporate directory are probably based on assumptions. If the assumptions are faulty or the data quality is poor, you might not capture everything in your mandate.

Join the July 21 Tech Talk: Tips and Tricks for Deciding What to Trust

Decisions Surrounding a White List

It’s easy to see that white lists have an inherent appeal — it becomes quickly apparent when your assumptions are incorrect or your knowledge is incomplete. The disadvantage of white lists is that they require constant updating and revision. In practice, we tend to see a combination of white lists and blacklists. A list of administrative users might be a white list (i.e., exclusion of all nonadministrative users), while the list of DML commands, which change the data in the database, are blacklists.

The next decision comes down to specificity. Whether you are dealing with a white list or a blacklist, the level of detail you choose to specify the filtering makes a big difference to how well your DAM processes and systems run. For example, say you have three lists of sensitive tables for three different databases. You have a choice to either apply a master list of sensitive tables to all three databases or split them into three different lists and apply them individually to each database. Hopefully the trade-off is clear: If you apply a separate list to each database, you have three definitions to maintain and review. If you apply the filter globally, you have one definition, but there is a possibility of false positives.

Another place this commonly appears is in defining direct access connections, or those connections that occur outside of an application or batch process in the database. The more specific you are about the context of the connection, the less chance of a false negative, but it results in a larger list that needs to be maintained.

Database activity monitors have features that make session-parameter white lists extremely detailed if you have a need for it, but keep the specificity trade-off in mind before going that route.

Applying Frameworks to Your White List

The last thing to mention is the mechanisms involved in monitoring. It’s important to know how the database audit records are accumulated so that you understand what information is being presented to you when building white lists and blacklists.

The latest and most popular database auditing frameworks primarily use the client/server communication channel as their source of audit information. They tap into the database communication streams, parse the data and log it. Given that, it’s good to think about what the systems can and cannot verify. Certain data can be spoofed by advanced attackers or people attempting to hide what they are doing, while others are more difficult or downright impossible.

Let’s take a look at some client-side session parameters as an example:

  • Client IP;
  • Database user;
  • Operating system user; and
  • Source program.

You can be reasonably confident in client IPs because spoofing them by changing the packet headers results in the returned data being sent to the wrong IP. In protocols that rely on handshakes, the risk of spoofing is very low.

The database user is also reliable because that user is verified by a password. The monitoring system can see both the authentication request and the response that it has succeeded or failed.

Operating system user is meant to reflect the current user of the machine being used to access the database. Authentication may have taken place on a remote machine, but the monitoring system has no knowledge of that, so it trusts that the client is reporting what is accurate. It’s easy to think of a scenario where a custom database client could be created to misreport the operating system user. In fact, some clients allow you to set it to whatever you like without building your own client — the standard ORACLE JDBC client allows you to do this by modifying the JDBC properties, for instance.

The same logic applies to source programs. It is a good idea to combine source program and OS user with more reliable client identifiers such as database username and/or client IP. This is not to say that source program and OS user don’t provide useful information — they are often extremely helpful — but it’s just important to keep their limitations in mind.

This information was intended to provide a foundational set of knowledge for defining how you want to monitor your databases. For more information, register for the July 21 Tech Talk, “Worrying About Your Whitelists – Tips and Tricks for Deciding What to Trust.”

More from Data Protection

3 Strategies to overcome data security challenges in 2024

3 min read - There are over 17 billion internet-connected devices in the world — and experts expect that number will surge to almost 30 billion by 2030.This rapidly growing digital ecosystem makes it increasingly challenging to protect people’s privacy. Attackers only need to be right once to seize databases of personally identifiable information (PII), including payment card information, addresses, phone numbers and Social Security numbers.In addition to the ever-present cybersecurity threats, data security teams must consider the growing list of data compliance laws…

How data residency impacts security and compliance

3 min read - Every piece of your organization’s data is stored in a physical location. Even data stored in a cloud environment lives in a physical location on the virtual server. However, the data may not be in the location you expect, especially if your company uses multiple cloud providers. The data you are trying to protect may be stored literally across the world from where you sit right now or even in multiple locations at the same time. And if you don’t…

From federation to fabric: IAM’s evolution

15 min read - In the modern day, we’ve come to expect that our various applications can share our identity information with one another. Most of our core systems federate seamlessly and bi-directionally. This means that you can quite easily register and log in to a given service with the user account from another service or even invert that process (technically possible, not always advisable). But what is the next step in our evolution towards greater interoperability between our applications, services and systems?Identity and…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today