Today I presented at USENIX WOOT ‘13 a new vulnerability that we had found in BIND, the most popular DNS server. Exploiting this vulnerability allows to reduce the amount of effort required for an off-path (blind) DNS cache poisoning attack.

The whitepaper is now publicly available, together with the presentation and ISC’s (the organization behind BIND) notification.

In this blog post I will describe the vulnerability in a less formal fashion.

What is Off-path DNS Cache Poisoning?

An off-path DNS cache poisoning is an attack at which we assert that the adversary does not see the traffic between the DNS resolver and some name server (NS), but manages to poison the cache of the DNS resolver nevertheless. In order to do so, the attacker usually induces a request between the DNS resolver and the NS, and then races against the genuine DNS answer. DNS protects against this type of attack by having some random data inside each DNS query. The random data consists of 3 elements:

  1. The DNS TXID (16 bits)
  2. The UDP Source port (~16 bits)
  3. The destination IP address.

The destination IP address can be random if the target zone has multiple authoritative name servers. We will see next that it indeed has some random properties.  The resolver accepts the answer if and only if the DNS answer contains matching data, otherwise the DNS answer is simply dropped. This means that the attacker must first win the race against the genuine answer and then guess the random data correctly.

Motivation

Since the TXID and UDP Source port randomness are well-studied, we decided to focus on the randomness of the name server (NS) selection. We wanted to see how much entropy can be added if a resolver simply chooses the target NS randomly out of the authoritative name servers. In order to do so we gathered some statistics. We downloaded the root zone file  and created the following CDF graph:

From the graph we can see that almost half of the TLDs have more than 8 NSs. This means a truly random NS selector can bring some entropy but obviously not as much as by randomizing the DNS TXID and UDP source port. However, derandomizing such algorithm can make existing attacks run faster or be more efficient. In addition it can make an off-path attack become on-path (Man-in-the-Middle), if the attacker is on a path between the resolver and some authoritative NS, but not on a path between the resolver and other authoritative NSs.

BIND’s NS selection: The Smoothed Round-Trip Time (SRTT) algorithm

The goal of the resolver when facing multiple authoritative NSs is to choose the fastest one, by Round-Trip Time (RTT). Since the RTT changes frequently, BIND maintains a moving average for each NS IP. This value is first initialized with a very low value between 1 and 32 microseconds. Each time there is a valid DNS response available, the SRTT is updated with a weighted sum which takes into account the old SRTT value and the newly available RTT. The weights are 0.7 for the old SRTT and 0.3 for the RTT. In order to avoid starvation, any non-queried NS out of authoritative NSs thatt can be queried, is multiplied with a decay factory of 0.98. In error situations, such as a network timeout, the SRTT is punished with 200ms with an upper bound of 1s.

The authoritative name servers are simply queried in the order of their SRTT values (NS with lowest SRTT is queried first).

All of the SRTT values are stored inside a cache which is accessed by the NS IP.

Attacks on the NS selection algorithm and our contribution

Now that we understand the SRTT algorithm, we can safely claim that If the attacker can somehow influence the SRTT algorithm, he can derandomize the NS selection. He can achieve this in two different approaches. The first one is to increase the SRTT of all name servers but one. The other strategy would be to lower the SRTT of the target NS.

There are two previous works which try to tackle the NS selection. The first one which adheres to the first approach is [Herzberg & Shulman, 2012]. It abuses IP fragmentation in order to cause the error operation of the SRTT algorithm.  The second, [Petr, 2009], adheres to the second strategy. It uses fast spoofed responses in order to exploit the update operation of the SRTT algorithm.

Both attacks are probabilistic (i.e. there is some guess work behind them so they don’t always work). Our attack is deterministic  (i.e. it always succeeds), and in addition the NS which we lower its SRTT does not even see it.

The Attack

In this post I will describe a simple variant of the attack, the more sophisticated ones (but not more effective) can be found on the whitepaper.

Our attack abuses non-open resolvers, i.e. NSs that do not answer on queries that they are not authoritative of. Most of the resolvers around the Internet are non-open. The attacker does not need to control them, but simply needs to know their IP address. We will see how the attacker can take leverage of them in order lower the SRTT value of an arbitrary NS to an arbitrary value on some target resolver. The attacker also hosts a malicious NS. We assume that this malicious NS is the authority of some domain, i.e. malicious.foo. Let’s also assume that we want to lower the SRTT value of one of the authoritative NSs of ibm.com. It’s important to mention that our attack does not depend on the amount of authoritative NSs the target zone has, but for the sake of simplicity, in this example, the zone has two nameservers: NS1 and NS2. We will lower the SRTT of NS2 on the target resolver, so it will be queried before NS1.

The attack begins by first querying the resolver for some domain the malicious NS is authoritative of, e.g. 666.malicious.foo. The resolver will eventually approach the malicious NS, which replies with a DNS delegation that contains:

  1. A large list of non-open resolvers
  2. The NS that we try to lower the SRTT of, NS2

We assume that NS2 was already on the SRTT cache of the resolver and was contacted prior to the attack, so its SRTT is based on some real RTT value. Since the non-open resolvers were not on the cache, they will be added with very low values between 1 and 32 microseconds according to the SRTT algorithm. This means that they will be queried before NS2 by the NS selection. The resolver will query each of them and they will refuse since they are non-open resolvers. After each query the SRTT value of NS2 will be decayed, i.e. multiplied by 0.98. The resolver will time out after 30 seconds and we can force this timeout if the amount of non-open resolvers is sufficiently large. This means that the resolver will not even get to querying NS2 and at the end of the attack the SRTT of NS2 will be 0.98^n of its original value, where n is the amount of non-open resolvers which were queried before the timeout.

Fix recommendation

The root cause of this vulnerability is that the resolver maintains a shared cache of SRTT values. The cache is shared in the sense that a malicious NS can affect SRTT values of NSs which are not related to it. We suggest to split the cache. For example, it can be split according to the currently queried zone.

More from Software Vulnerabilities

FYSA – Critical RCE Flaw in GNU-Linux Systems

2 min read - Summary The first of a series of blog posts has been published detailing a vulnerability in the Common Unix Printing System (CUPS), which purportedly allows attackers to gain remote access to UNIX-based systems. The vulnerability, which affects various UNIX-based operating systems, can be exploited by sending a specially crafted HTTP request to the CUPS service. Threat Topography Threat Type: Remote code execution vulnerability in CUPS service Industries Impacted: UNIX-based systems across various industries, including but not limited to, finance, healthcare,…

X-Force discovers new vulnerabilities in smart treadmill

7 min read - This research was made possible thanks to contributions from Joshua Merrill. Smart gym equipment is seeing rapid growth in the fitness industry, enabling users to follow customized workouts, stream entertainment on the built-in display, and conveniently track their progress. With the multitude of features available on these internet-connected machines, a group of researchers at IBM X-Force Red considered whether user data was secure and, more importantly, whether there was any risk to the physical safety of users. One of the most…

X-Force releases detection & response framework for managed file transfer software

5 min read - How AI can help defenders scale detection guidance for enterprise software tools If we look back at mass exploitation events that shook the security industry like Log4j, Atlassian, and Microsoft Exchange when these solutions were actively being exploited by attackers, the exploits may have been associated with a different CVE, but the detection and response guidance being released by the various security vendors had many similarities (e.g., Log4shell vs. Log4j2 vs. MOVEit vs. Spring4Shell vs. Microsoft Exchange vs. ProxyShell vs.…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today