July 14, 2017 By David Strom 2 min read

Everyone wants to know who was behind the latest audacious cyberattack. Security professionals have long attempted to identify threat actors through linguistic analysis, but this method is limited when it comes to attribution.

Part of the problem is that cybercriminals purposely build deception mechanisms into their code. “Deception is always a major part of an attack,” according to Network World. “The attackers want to make sure that if the operation is discovered, any evidence that’s unearthed points toward someone else.” This often means using servers or domain names from other places on purpose, or using a variety of communications paths that have nothing to do with their own country or place of origin.

As Fahmida Y. Rashid explained on CSO Online, “Linguistic analysis will very rarely lead to the smoking gun. At the very least, it will uncover a whole set of clues for researchers to track down, and at the best, it will support (or confirm) other pieces of evidence.”

Two Kinds of Linguistic Analysis

There are generally two kinds of linguistic analysis: one that looks at how the actual source code was written, and another that examines the actual text used. What’s the difference? The first case examines the style of code and determines whether it is similar to other pieces of code that have been found in malware samples. The second method is more about word choices found in user dialogues, comments within the code, input screens or other displays visible to the end user. All ransomware contains ransom notes, for example. Are the same words in these notes consistently misspelled, or do they have the same typographic conventions?

Part of linguistic analysis is understanding how native speakers use their language. If a threat actor regularly omits definite articles, for example, this is a good indication that he or she is probably not a native English speaker. However, people speak multiple languages and can also use machine translations, both of which can cloud the results.

An Inconclusive Method

The challenge with linguistic analysis is that isn’t enough to be conclusive on its own — it needs to be combined with other evidence to point the way toward attribution. In the case of WannaCry, the ransom notes were written in 27 different languages. One analyst concluded that a Chinese-speaking author was behind the original ransom messages, but the finding wasn’t ironclad.

Despite its inconclusiveness, linguistic analysis is a fascinating field of study. It’s also one that can improve as big data models mature, making the future of this security research bright.

More from Network

Databases beware: Abusing Microsoft SQL Server with SQLRecon

20 min read - Over the course of my career, I’ve had the privileged opportunity to peek behind the veil of some of the largest organizations in the world. In my experience, most industry verticals rely on enterprise Windows networks. In fact, I can count on one hand the number of times I have seen a decentralized zero-trust network, enterprise Linux, macOS network, or Active Directory alternative (FreeIPA). As I navigate my way through these large and often complex enterprise networks, it is common…

Easy configuration fixes can protect your server from attack

4 min read - In March 2023, data on more than 56,000 people — including Social Security numbers and other personal information — was stolen in the D.C. Health Benefit Exchange Authority breach. The online health insurance marketplace hack exposed the personal details of Congress members, their families, staff and tens of thousands of other Washington-area residents. It appears the D.C. breach was due to “human error”, according to a recent report. Apparently, a computer server was misconfigured to allow access to data without proper…

X-Force identifies vulnerability in IoT platform

4 min read - The last decade has seen an explosion of IoT devices across a multitude of industries. With that rise has come the need for centralized systems to perform data collection and device management, commonly called IoT Platforms. One such platform, ThingsBoard, was the recent subject of research by IBM Security X-Force. While there has been a lot of discussion around the security of IoT devices themselves, there is far less conversation around the security of the platforms these devices connect with.…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today