Everyone wants to know who was behind the latest audacious cyberattack. Security professionals have long attempted to identify threat actors through linguistic analysis, but this method is limited when it comes to attribution.

Part of the problem is that cybercriminals purposely build deception mechanisms into their code. “Deception is always a major part of an attack,” according to Network World. “The attackers want to make sure that if the operation is discovered, any evidence that’s unearthed points toward someone else.” This often means using servers or domain names from other places on purpose, or using a variety of communications paths that have nothing to do with their own country or place of origin.

As Fahmida Y. Rashid explained on CSO Online, “Linguistic analysis will very rarely lead to the smoking gun. At the very least, it will uncover a whole set of clues for researchers to track down, and at the best, it will support (or confirm) other pieces of evidence.”

Two Kinds of Linguistic Analysis

There are generally two kinds of linguistic analysis: one that looks at how the actual source code was written, and another that examines the actual text used. What’s the difference? The first case examines the style of code and determines whether it is similar to other pieces of code that have been found in malware samples. The second method is more about word choices found in user dialogues, comments within the code, input screens or other displays visible to the end user. All ransomware contains ransom notes, for example. Are the same words in these notes consistently misspelled, or do they have the same typographic conventions?

Part of linguistic analysis is understanding how native speakers use their language. If a threat actor regularly omits definite articles, for example, this is a good indication that he or she is probably not a native English speaker. However, people speak multiple languages and can also use machine translations, both of which can cloud the results.

An Inconclusive Method

The challenge with linguistic analysis is that isn’t enough to be conclusive on its own — it needs to be combined with other evidence to point the way toward attribution. In the case of WannaCry, the ransom notes were written in 27 different languages. One analyst concluded that a Chinese-speaking author was behind the original ransom messages, but the finding wasn’t ironclad.

Despite its inconclusiveness, linguistic analysis is a fascinating field of study. It’s also one that can improve as big data models mature, making the future of this security research bright.

More from Network

Beyond Shadow IT: Expert Advice on How to Secure the Next Great Threat Surface

You've heard all about shadow IT, but there’s another shadow lurking on your systems: Internet of Things (IoT) devices. These smart devices are the IoT in shadow IoT, and they could be maliciously or unintentionally exposing information. Threat actors can use that to access your systems and sensitive data, and wreak havoc upon your company. A refresher on shadow IT: shadow IT comes from all of the applications and devices your employees use without your knowledge or permission to get…

X-Force 2022 Insights: An Expanding OT Threat Landscape

This post was written with contributions from Dave McMillen. So far 2022 has seen international cyber security agencies issuing multiple alerts about malicious Russian cyber operations and potential attacks on critical infrastructure, the discovery of two new OT-specific pieces of malware, Industroyer2 and InController/PipeDream, and the disclosure of many operational technology (OT) vulnerabilities. The OT cyber threat landscape is expanding dramatically and OT asset owners and operators, all of whom understand the need to keep critical infrastructures running safely, need to be aware…

How to Compromise a Modern-Day Network

An insidious issue has been slowly growing under the noses of IT admins and security professionals for the past twenty years. As companies evolved to meet the technological demands of the early 2000s, they became increasingly dependent on vulnerable technology deployed within their internal network stack. While security evolved to patch known vulnerabilities, many companies have been unable to implement released patches due to a dependence on legacy technology. In just 2022 alone, X-Force Red found that 90% of all…

Black Basta Besting Your Network?

This post was written with contributions from Chris Caridi and Kat Weinberger. IBM Security X-Force has been tracking the activity of Black Basta, a new ransomware group that first appeared in April 2022. To date, this group has claimed attribution of 29 different victims across multiple industries using a double extortion strategy where the attackers not only execute ransomware but also steal data and threaten to release it publicly if the ransom demands are not met. The data disclosure element…