Everyone wants to know who was behind the latest audacious cyberattack. Security professionals have long attempted to identify threat actors through linguistic analysis, but this method is limited when it comes to attribution.

Part of the problem is that cybercriminals purposely build deception mechanisms into their code. “Deception is always a major part of an attack,” according to Network World. “The attackers want to make sure that if the operation is discovered, any evidence that’s unearthed points toward someone else.” This often means using servers or domain names from other places on purpose, or using a variety of communications paths that have nothing to do with their own country or place of origin.

As Fahmida Y. Rashid explained on CSO Online, “Linguistic analysis will very rarely lead to the smoking gun. At the very least, it will uncover a whole set of clues for researchers to track down, and at the best, it will support (or confirm) other pieces of evidence.”

Two Kinds of Linguistic Analysis

There are generally two kinds of linguistic analysis: one that looks at how the actual source code was written, and another that examines the actual text used. What’s the difference? The first case examines the style of code and determines whether it is similar to other pieces of code that have been found in malware samples. The second method is more about word choices found in user dialogues, comments within the code, input screens or other displays visible to the end user. All ransomware contains ransom notes, for example. Are the same words in these notes consistently misspelled, or do they have the same typographic conventions?

Part of linguistic analysis is understanding how native speakers use their language. If a threat actor regularly omits definite articles, for example, this is a good indication that he or she is probably not a native English speaker. However, people speak multiple languages and can also use machine translations, both of which can cloud the results.

An Inconclusive Method

The challenge with linguistic analysis is that isn’t enough to be conclusive on its own — it needs to be combined with other evidence to point the way toward attribution. In the case of WannaCry, the ransom notes were written in 27 different languages. One analyst concluded that a Chinese-speaking author was behind the original ransom messages, but the finding wasn’t ironclad.

Despite its inconclusiveness, linguistic analysis is a fascinating field of study. It’s also one that can improve as big data models mature, making the future of this security research bright.

More from Network

X-Force Identifies Vulnerability in IoT Platform

4 min read - The last decade has seen an explosion of IoT devices across a multitude of industries. With that rise has come the need for centralized systems to perform data collection and device management, commonly called IoT Platforms. One such platform, ThingsBoard, was the recent subject of research by IBM Security X-Force. While there has been a lot of discussion around the security of IoT devices themselves, there is far less conversation around the security of the platforms these devices connect with.…

4 min read

Cybersecurity in the Next-Generation Space Age, Pt. 4: New Space Future Development and Challenges

4 min read - View Part 1, Introduction to New Space, Part 2, Cybersecurity Threats in New Space, and Part 3, Securing the New Space, in this series. After the previous three parts of this series, we ascertain that the technological evolution of New Space ventures expanded the threats that targeted the space system components. These threats could be countered by various cybersecurity measures. However, the New Space has brought about a significant shift in the industry. This wave of innovation is reshaping the future…

4 min read

Backdoor Deployment and Ransomware: Top Threats Identified in X-Force Threat Intelligence Index 2023

4 min read - Deployment of backdoors was the number one action on objective taken by threat actors last year, according to the 2023 IBM Security X-Force Threat Intelligence Index — a comprehensive analysis of our research data collected throughout the year. Backdoor access is now among the hottest commodities on the dark web and can sell for thousands of dollars, compared to credit card data — which can go for as low as $10. On the dark web — a veritable eBay for…

4 min read

Cybersecurity in the Next-Generation Space Age, Pt. 2: Cybersecurity Threats in New Space

7 min read - View Part 1 in this series, Introduction to New Space. The growth of the New Space economy, the innovation in technologies and the emergence of various private firms have contributed to the development of the space industry. Despite this growth, there has also been an expansion of the cyberattack surface of space systems. Attacks are becoming more and more sophisticated and affecting several components of the space system’s architecture. Threat Actors' Methodology Every space system architecture is composed of three…

7 min read