Internet Noise
Honeypots are an easy and popular way to get statistics on the “Internet noise.” Getting more knowledge on Internet noise gives you more insight into what is out there and is one of the sources that helps in getting better security analytics. I was curious what kind of traffic a honeypot name server receives in a public cloud; my research follows.
Honeypot Name Server Setup
The machine was a default Ubuntu 14.04.1 LTS placed in the Frankfurt EC Amazon cloud. It had one IPv4 address configured (I did not look at IPv6 data). Neither the IP address nor the Domain Name Server (DNS) service was published on any public channels. Any request coming in can be considered suspicious. I have not looked into the effects of reused IP addresses of cloud providers.
The server also had a number of other honeypots: dionaea, Glastopf, Conpot, SNMP, NTP and Kippo. Kippo is a SSH honeypot that can help to attract attackers and improve security. The setup and data gathering (via ELK) is available via a repository at Github.
I used one of the most popular name servers, BIND, with a fairly default setup. Recursion, IPv6 and forwarders were disabled, logging was enabled, a custom server version number was set and the server had one zone file. A zone file is a file that describes what mappings of IP addresses and hosts/domain names are available in a subdomain. This zone file contains one record that points to the same host. So, anyone querying the honeypot name server will get a reply with the IP address of the honeypot server (“a.b.c.d”).
Bind Configuration
options { directory "/var/cache/bind"; dnssec-validation auto; recursion no; allow-transfer { none; }; auth-nxdomain no; # conform to RFC1035 // listen-on-v6 { any; }; statistics-file "/var/log/named/named_stats.txt"; memstatistics-file "/var/log/named/named_mem_stats.txt"; version "9.9.1-P2";}; logging{ channel query_log { file "/var/log/named/query.log"; severity info; print-time yes; print-severity yes; print-category yes; }; category queries { query_log; };};
$TTL 10<br/>@ IN SOA localhost. root.localhost. (<br/> 1 ; Serial<br/> 10 ; Refresh<br/> 10 ; Retry<br/> 10 ; Expire<br/> 10 ) ; Negative Cache TTL<br/>;<br/><br/> IN NS localhost<br/>* IN A a.b.c.d
Statistics
The first set of statistics represents the findings of the raw data from the log file.
Time Frame
If we map the queries based on the date, we can observe a spike between Jan. 15 and Jan. 20 and then a number of spikes along the end of January and beginning of February.
Further investigation of these spikes show us that they are caused by a single IP belonging to Ruhr-Universität Bochum in Germany, which has a website explaining the Amplification DDoS Tracker Project. It uses the obtained scan data to warn network owners of possible problems. Two IP addresses were seen doing these scans: one from Ruhr-Universität Bochum and one from Universität des Saarlandes, also in Germany.
We’ll observe later that a lot of the queries were systems scanning for open DNS resolvers.
Where Did the Queries Come From?
I extracted the IPs from the logs and then used the IP to the autonomous system number (ASN) mapping service of Team Cymru to get the country and ASN data. An autonomous system (AS), tells us to what network block the IP address belongs. The script can also be found on Github.
The majority of the scans came from Germany, the U.S. and China. The AS of DFN (Germany) and Chinanet-Hunan caused the bulk of the queries. More than half of the queries originate from Europe (RIPE). These results are not surprising considering the large set of requests coming from the Ruhr-Universität Bochum.
What Were They Looking For?
Most of the request types were DNS queries for an A record. More than 18 percent of the queries were for ANY records. The TXT requests were mostly intended to retrieve the DNS server version.
The queries asked for records of Google, Shadowserver or the version of the Bind name server. It is not much of a surprise that the most popular TLDs are .com and .org. A carefully chosen domain name and TLD are something you should consider when setting up your business. More interesting in this data are the presence of .ru and .cn TLDs.
Scans for Open Resolvers
As mentioned above, a lot of the requests were done by scans for open resolvers. As a matter of fact, around 56 percent of the queries came from organizations that are scanning for open resolvers.
That is a lot, but they are fairly easy to spot in the logs:
01-Feb-2015 04:57:49.352 queries: info: client x.x.x.x#34341 (dnsscan.shadowserver.org): query: dnsscan.shadowserver.org IN A + (x.x.x.x)02-Feb-2015 19:15:44.507 queries: info: client x.x.x.x#41248 (www.goOGLe.co.in): query: www.goOGLe.co.in IN A + (x.x.x.x)07-Jan-2015 06:36:04.149 queries: info: client x.x.x.x#33481 (7f14f6df.openresolvertest.net): query: 7f14f6df.openresolvertest.net IN A + (x.x.x.x)11-Jan-2015 14:54:03.692 queries: info: client x.x.x.x#43656 (openresolver.com): query: openresolver.com IN A +E (x.x.x.x)01-Feb-2015 06:42:54.797 queries: info: client x.x.x.x#46018 (7f14f6df.openresolverproject.org): query: 7f14f6df.openresolverproject.org IN A + (x.x.x.x)08-Feb-2015 04:12:45.562 queries: info: client x.x.x.x#28207 (9h2y.96bf5d36.wc.syssec-research.mmci.uni-saarland.de): query: 9h2y.96bf5d36.wc.syssec-research.mmci.uni-saarland.de IN A + (x.x.x.x)<br/>
The high amount of probes for open resolvers can be annoying and a real log polluter. If you do not filter these requests from entering your log monitoring system, it will be hard to spot abuse. Applying the proper filters to remove the Internet noise before processing the logs, blocking their requests or asking them nicely to stop scanning you is one of the things you’ll absolutely have to do if you want to keep an eye on your DNS infrastructure. It will also greatly increase the results you get from your log-monitoring or SIEM solution.
There might also be a legal issues with this scanning. Most of the queries for open resolvers are done with good intent, but it’s not difficult to imagine that someone who’s not familiar with these organizations consider these scans to be malicious. Andrew Cormack covers some of the legal issues in his paper “Scanning for Vulnerabilities: Is It Lawful?“.
Results Without Scans for Open Resolvers
The results below are the logged queries without the open resolver scans:
There’s no real noticeable spike in time. Most of the queries originate from the U.S. and China. The queries are equally spread amongst the RIRs Ripe, Arin and Apnic.
The request types in this category are either queries for an A record or the ANY resources. Most queries were for the Google domain.
Having the data sets without the open resolver hosts also reveals the behavior of two particular hosts: one from China (AS 63835) and one from Russia (AS 2848).
The Chinese host does regular queries for the Bind server version and then looks up the A record of www.google.it and www.google.com:
05-Feb-2015 18:35:21.888 queries: info: client x.x.x.x#56334 (VERSION.BIND): query: VERSION.BIND CH TXT + (x.x.x.x)06-Feb-2015 01:19:13.674 queries: info: client x.x.x.x#39664 (www.google.it): query: www.google.it IN A + (x.x.x.x)06-Feb-2015 16:49:14.384 queries: info: client x.x.x.x#51102 (www.google.com): query: www.google.com IN A + (x.x.x.x)07-Feb-2015 01:57:22.995 queries: info: client x.x.x.x#45938 (VERSION.BIND): query: VERSION.BIND CH TXT + (x.x.x.x)07-Feb-2015 14:35:58.562 queries: info: client x.x.x.x#41664 (www.google.it): query: www.google.it IN A + (x.x.x.x)07-Feb-2015 23:00:43.537 queries: info: client x.x.x.x#49252 (www.google.com): query: www.google.com IN A + (x.x.x.x)08-Feb-2015 13:27:10.678 queries: info: client x.x.x.x#34047 (VERSION.BIND): query: VERSION.BIND CH TXT + (x.x.x.x)<br/>
The Russian host does only regular queries for com:
06-Feb-2015 08:45:17.256 queries: info: client x.x.x.x#42795 (com): query: com IN ANY +E (x.x.x.x)08-Feb-2015 15:44:01.787 queries: info: client x.x.x.x#33207 (com): query: com IN ANY +E (x.x.x.x)<br/>
What’s Wrong With an ‘ANY’ Request?
About half of the requests were an “ANY” request.
10-Feb-2015 07:48:38.565 queries: info: client x.x.x.x#32767 (isc.org): query: isc.org IN ANY +ED (x.x.x.x)
Typically, this is an attempt for a DNS amplification attack with spoofed queries. All of these queries had the recursive () flag set, indicating it comes from a client or a server that is forwarding the queries. A very small number of hosts requested () to have DNSSEC support in the reply.
Besides the Google and ISC domains, we also observe some more “exotic” domains that were used to at least test for DNS amplification attack possibilities. These domains were all already observed as being used for attacks:
DNS Amplification Attacks Observer has an overview of the domains used in attacks, and it provides an iptables ruleset to blacklist these domains.
So, why would anyone use these requests? There’s no particular reason for sending these requests in normal Internet behavior. It becomes more obvious if you look at the replies you get for some of the queried domains. You can test this yourself with:
dig -t ANY @8.8.8.8 mydomain
Some of the replies exceeded sizes of 6,000 bytes.
;; MSG SIZE rcvd: 6800;; MSG SIZE rcvd: 6584
For regular use (except for the example for zone transfers), DNS uses the UDP protocol on port 53. DNS packets are almost always relatively small in size (512 bytes, not considering different headers). Note that DNSSEC often requires larger packets. DNS can handle larger packets via the use of EDNS0 while sticking to UDP. There’s still a limit of the packet size with EDNS0 (maybe because the server simply does not support EDNS0), and then these large packets have the DNS request fallback to TCP. The output of dig shows this with:
;; Truncated, retrying in TCP mode.
In summary: You have a small request over a simple protocol that requires few resources turned into a large reply on a more complex protocol that requires substantially more resources.
One of the differences between UDP and TCP is that with UDP, there is no handshake. You send the request and you forget about it; it is a “stateless” protocol. With TCP, there is a handshake that requires substantially more computer resources. Additionally, it is also fairly easy to spoof the source IP address.
Combining these two makes a great attack vector for conducting an amplification attack against specific targets.
The folks at OpenDNS have described how these DNS amplification attacks work. The same patterns have been observed by CloudFlare.
Protection against these types of attacks requires a multilayered approach. DNS administrators should ensure that their recursive name servers are not open resolvers by limiting the list of clients that can do a recursive query. For authoritative name servers you should implement response rate limiting response rate limiting (RRL). Network administrators can help by only allowing known network prefixes to leave their network (implementing BCP 38). This helps fight all UDP based DDoS amplification attacks (DNS, SNMP, NTP, etc.).
Conclusion
A random DNS server quickly receives a lot of queries for open resolver tests. This internet noise will polute the logs, which makes it harder for you to detect attacks on your DNS infrastructrue.
A honeypot DNS server is one of the tools that allows you to capture this Internet noise.
With the use of a few scripts, you can easily filter out the top scanners from the honeypot dataset. You can then use this white list to filter them from your real DNS logs and put more value in your log-monitoring or SIEM solution. Some kind of manual verification of the white list is still necessary, though, to prevent genuine attackers from ending up on your white list.