Signature-Based Detection With YARA

In a previous post, I talked about how you can use STIX, TAXII and CybOX to share threat intelligence.

One of the key elements for putting cyberthreat information to good use requires that the information is actionable, or at least usable. The shared information has to be accurate, complete and relevant for your environment.

CybOX provides a common structure for representing cyber observables across and among the operational areas of enterprise cybersecurity. CybOX can contain hashes, strings or registry keys. Information provided via the system can be used to check for the presence of malware inside your environment. YARA is one of the alternatives to using CyBOX, but the two are not mutually exclusive.

What Is YARA?

YARA is a tool designed to help malware researchers identify and classify malware samples. It’s been called the pattern-matching Swiss Army knife for security researchers (and everyone else). It is multiplatform and can be used from both its command-line interface or through your own Python scripts.

The tool allows you to conduct signature-based detection of malware, something similar to what antivirus solutions can do for you.

How Do You Use It?

In order to use this method, you need a rule and a file that you want to check. For example, to run it from the command line, you would use:
YARA use
This will output if the given rule matches on the provided file. If you don’t see output, and you have not used the negate option, then this means that no rule has matched.

You can start it with a number of configuration switches; these are the two most important :

  • -n: Print only not-satisfied rules (negate).
  • -r: Recursively search through directories.

Rules

A rule is a set of strings and some form of logic, written in Boolean expressions.

YARA rule

There is support for three different types of strings:

  • Hexadecimal strings, which are useful for defining raw bytes;
  • Text strings;
  • Regular expressions.

The conditions are Boolean expressions that you will recognize from regular programming languages. They can work on any of the given strings but also on special built-in variables, such as the file size, or on external variables that you define outside the rule. There is also support for the use of modules, such as Cuckoo, to extend the features that you can use in the conditions.

You’ll benefit the most from YARA if you provide it a good ruleset. You can either write your own rules or get them from another provider.

Write Your Own Rules

Writing your own rules is not that difficult if you take these guidelines into consideration:

  • The criteria that you use to match must be a necessary part of the behavior of the malware.
  • The criteria should be sufficient enough to distinguish the tested malware family from other malware families.
  • The criteria has to be something that is common across different samples.

Once you have analyzed the malware and extracted useful, recognizable data from it, you can then transform the information into YARA strings and combine them with some form of logic.

Get Rules From External Sources

Because YARA uses signatures similar to antivirus solutions, it would make sense to reuse these signatures as a rule database. With the use of the script clamav_to_yara.py, you can convert the ClamAV signature database to your own ruleset.

Another source for rules is the Github repository YaraRules. This is a ruleset under the GNU-GPLv2 license maintained by a group of IT security researchers. The rules are stored in different categories — rules aimed to detect anti-debug and anti-visualization techniques, malicious documents, packers, etc. — and frequently updated. You can get them by cloning the Github repository.

Some threat intelligence sharing platforms, such as MISP and ThreatConnect, also support YARA. This allows you to build rules based on your own collected threat information.

Automatically Generate Rules

With the use of the Rule Generator from Joe Sandbox, you can create signatures for Windows based on static and dynamic behavior data. Note that the same rules apply as when using free online malware analysis sandboxes. You are uploading your files to an external cloud service, which shouldn’t be done with sensitive files or data containing any form of user credentials.

Additional Use Cases

I mentioned earlier that you can convert the ClamAV database to a usable ruleset. It also goes the other way around. Suppose you have a set of detailed rules. You can configure ClamAV to extend its feature set with your provided rules and support YARA. This allows you to match the rules on compressed or packed files. The VirusTotal private API also has a feature with which you can enter your own rules and have them triggered when a matching sample is uploaded.

Signature-Based Protection Is Not Enough

Only relying on signature-based protection is no longer good enough. Attackers have developed countermeasures that they can use to bypass this method. With the use of various crypting services, packers and polymorphism, they can easily generate malware that’s different enough so that it no longer matches existing signatures. It then takes some time before the new flavor is picked up and a signature is shared.

These drawbacks don’t make signature-based detection obsolete. The security community is strong at sharing new threat indicators, so these types of tools will still prove to be an important asset to your arsenal.

Conclusion

Although signature-based detection with YARA has its limits, it is an easy-to-use and fairly simple way of detecting malware in your environment. It would not be wise to rely on it as the only threat protection measure, but given the straightforward use, missing out on this tool would not be a good idea, either.

Share this Article:
Koen Van Impe

Security Analyst

Koen Van Impe is a security analyst who worked at the Belgian national CSIRT and is now an independent security researcher. He has a twitter feed (@cudeso) and a personal blog (www.vanimpe.eu). Koen is passionate about computer security, incident handling, network analysis, honeypots, Linux, log management and web technologies. He is responsible for the follow-up and coordination of computer security incidents and gives security advice to customers.