March 16, 2015 By Jaikumar Vijayan 3 min read


Cybercriminals apparently have a tendency to use the same (or at least similar) lexical styles when establishing domains for phishing and advanced persistent threat (APT) attacks, making it possible for security researchers to identify sites using natural language processing (NLP) techniques.

That’s according to OpenDNS Security Labs, which is prototyping a tool dubbed NLPRank to see if it can identify potentially malicious websites and phishing domains more quickly. Based on tests so far, the natural language processing tool could prove to be a “robust” method for defending against APTs, claimed OpenDNS security researcher Jeremiah O’Connor in a blog post.

Security researchers at OpenDNS recently analyzed DNS data associated with attacks carried out by the cybercrime group behind the Carbanak malware, which is believed to have stolen hundreds of millions of dollars from banks around the world in a sophisticated, multiyear APT campaign.

APT Campaigns

To penetrate banks and various other financial institutions, these cybercriminals would typically target employees through phishing emails laced with malware, which, when installed on a system, would allow them to take complete control of the compromised computer. At that point, they would move laterally across the network to other more critical systems, gain access to administrative accounts, control ATMs and siphon out huge sums of money.

When comparing the malicious domains and spoofing techniques used in the Carbanak campaign with those used in other APTs like the Darkhotel cyber espionage campaign, OpenDNS observed they were constructed in a similar lexical fashion. “One of the spoofing techniques often leveraged is the impersonation of a legitimate software or tech company in an email claiming a required software update,” O’Connor said.

Domains used in the Darkhotel campaign, for example, included adobeupdates.com, adobeplugs.net, adoberegister.flashserv.net and microsoft-xpupdate.com. Meanwhile, the Carbanak APT used domains such as update-java.net and adobe-update.net. Other instances of domain names sharing a similar lexical structure included gmailboxes.com, microsoft-update-info.com and firefoxupdata.com.

Lexical Similarities

In reviewing the attack data, OpenDNS discovered multiple cases of suspicious websites advertising fake Java updates, sharing the same infrastructure and exhibiting similar attack patterns, O’Connor said. Researchers discovered that APT groups have a tendency to spoof legitimate domains and use spear phishing tactics to obfuscate their criminal campaigns.

Because of the lexical similarities among the domains used in these criminal campaigns, it is possible to use NLP techniques to identify potentially malicious typo-squatting and targeted phishing domains, O’Connor said. NLP is basically a technique for extracting meaning from written words using specialized software. Its tools are used widely to read and interpret free text documents in a variety of applications and fields.

Natural Language Processing via Minimum-Edit Distance

According to O’Connor, OpenDNS’ NLPRank system uses NLP, HTML tag analysis and a method known as minimum-edit distance to see if it can distinguish between legitimate and malicious domains on the Internet.

The minimum-edit distance method checks for the distance between words in legitimate and typo-squatting domains. It is used in other applications like spell-checking and speech translation, as well, and offers a way to define and differentiate the language used by malicious domains from the one used by legitimate domains, O’Connor said.

Another process OpenDNS uses in conjunction with NLP to identify malicious domains is autonomous systems number (ASN) mapping. Malicious domains are usually hosted on IP networks that are not associated with the domain they’re attempting to spoof. For example, if a domain offering an Adobe update maps to an IP network that does not belong to Adobe, there is a good chance the domain is malicious. OpenDNS has built an ASN map of all legitimate domains on the Internet along with their appropriate ASNs, O’Connor said.

Using these methods, NLPRank has reportedly been able to spot several types of phishing attacks spoofing major companies such as Wells Fargo, Facebook, Dropbox and others.

More from

Unified endpoint management for purpose-based devices

4 min read - As purpose-built devices become increasingly common, the challenges associated with their unique management and security needs are becoming clear. What are purpose-built devices? Most fall under the category of rugged IoT devices typically used outside of an office environment and which often run on a different operating system than typical office devices. Examples include ruggedized tablets and smartphones, handheld scanners and kiosks. Many different industries are utilizing purpose-built devices, including travel and transportation, retail, warehouse and distribution, manufacturing (including automotive)…

Stealthy WailingCrab Malware misuses MQTT Messaging Protocol

14 min read - This article was made possible thanks to the hard work of writer Charlotte Hammond and contributions from Ole Villadsen and Kat Metrick. IBM X-Force researchers have been tracking developments to the WailingCrab malware family, in particular, those relating to its C2 communication mechanisms, which include misusing the Internet-of-Things (IoT) messaging protocol MQTT. WailingCrab, also known as WikiLoader, is a sophisticated, multi-component malware delivered almost exclusively by an initial access broker that X-Force tracks as Hive0133, which overlaps with TA544. WailingCrab…

Operationalize cyber risk quantification for smart security

4 min read - Organizations constantly face new tactics from cyber criminals who aim to compromise their most valuable assets. Yet despite evolving techniques, many security leaders still rely on subjective terms, such as low, medium and high, to communicate and manage cyber risk. These vague terms do not convey the necessary detail or insight to produce actionable outcomes that accurately identify, measure, manage and communicate cyber risks. As a result, executives and board members remain uninformed and ill-prepared to manage organizational risk effectively.…

Pentesting vs. Pentesting as a Service: Which is better?

5 min read - In today's quickly evolving cybersecurity landscape, organizations constantly seek the most effective ways to secure their digital assets. Penetration testing (pentesting) has emerged as a leading solution for identifying potential system vulnerabilities while closing security gaps that can lead to an attack. At the same time, a newer entrant into the security arena is Pentesting as a Service (PTaaS). Although PTaaS shares some similarities with pentesting, distinct differences make them two separate solutions. This article will discuss how these methodologies…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today