Malware infections are among the most frequently encountered threats in computer security. According to the “ENISA Threat Landscape Report 2017” some antivirus vendors detected more than 4 million malware samples per day and more than 700 million samples in Q1 2017 alone.
These stunning numbers underscore the importance of establishing an incident response plan for malware. However, security teams can’t handle all malware alerts at once. The National Institute of Standards and Technology (NIST)’s “Guide to Malware Incident Prevention and Handling for Desktops and Laptops” outlined steps organizations can take to develop a malware classification scheme to prioritize these incidents.
The analysis phase in an incident response plan involves identifying and understanding the type of malware that was detected. The outcome of this process is then used as the input for the actual malware classification.
The analysis can be done in stages that range from fairly easy with fully automated tools, such as VMRay, to very difficult techniques involving manual code reversing. The results of the analysis should include a set of indicators of compromise (IoCs) and detailed information about the characteristics, propagation methods and behavior of the malware.
To prepare for future incidents, organizations should establish specific playbooks for each malware classification type. This enables security teams to more effectively prioritize incidents. For example, malware that is categorized according to its ability to propagate automatically should take precedence over malware that is merely classified as an unwanted program.
Obviously, to make use of the correct playbook, you must first be able to classify the detected malware correctly. This is where things can get tricky.
Malware Classification in an Ideal World
In an ideal world, a classification scheme would place malware types in an unambiguous classification tree. Unfortunately, real-world malware often has a wide range of nefarious capabilities, protection methods, target distributions and propagation methods. This makes classification harder and very dependent on the goal the security team is trying to achieve. Additionally, families of malware often share numerous similarities but can have minor modifications that cause confusion during the classification process.
Malware identified by automated analysis tools, such as sandboxes, or via static analysis of properties has often already been recognized and named by antivirus firms. You can use this information to start, but classification based solely on malware names has its limits.
Existing Classification Schemes
Since we don’t live an ideal world, let’s take a closer look at some existing classification schemes and discuss how they can help security teams prioritize malware threats and optimize their incident response processes.
Classification by Images
A paper published by the Computer Science Laboratory described a static technique to do malware binary classification using images. The malware binary is converted to an image before a texture-based feature is computed on the image to characterize the malware. This classification approach is resilient to packing strategies due to the use of fixed-size encryption keys. It also enables security teams to visually characterize and classify the malware samples.
Malware clustering provides, among other things, a visual representation of the relationships between malware. These results can greatly improve analysts’ ability to identify similarities between large sets of malware samples and empower them to more quickly recognize samples that are already known or that share similarities with known malware. Ultimately, this frees up security teams to focus on new types of malware.
Analysts can generate quick and meaningful results by using impfuzzy in combination with the Neo4J graph database. Impfuzzy uses fuzzy hashing to calculate hash values of the import API. It is also available as a Volatility plugin.
Although it is not specifically designed to represent malware clusters, the VirusTotal Graph tool helps analysts understand the relationship between malware files in a graph representation. It provides visibility into the entire VirusTotal data set as well as an intuitive interface to pivot and navigate over them.
The Antivirus Vendor Naming Convention
Antivirus vendors love to assign quirky names to malware using a signature-based approach. The top-level classification is often done via a basic naming convention. Typically, the malware name prefix designates the targeted platform or the malware capabilities, followed by the malware family name (e.g., “Trojan.Win32”).
Unfortunately, this naming convention is often limited to individual vendors, which makes sharing information more difficult. Additionally, this technique does not always describe the malware’s full range of capabilities.
Kaspersky Lab categorizes malware according to a classification tree. The malware samples are placed in a diagram according to two basic rules:
- Behavior that poses the least threat is shown in the lower area of the diagram.
- Behavior that poses a greatest threat is displayed in the upper part of the diagram.
For example, if an email worm represents a higher risk than an internet relay chat (IRC) worm, the email worm is placed near the top of the diagram and above the IRC worm.
Microsoft uses the Computer Antivirus Research Organization (CARO) malware naming scheme according to the following format:
- Type — The behavior of the malware. For example, is it a Trojan, spammer or remote access tool?
- Platform — The targeted platform, programming language or file format.
- Family — A grouping based on common characteristics, including attribution to the same authors.
- Variant — A distinct version of the malware.
- Additional information — Extra details, including how it is used as part of a multicomponent threat. For example, “!lnk” indicates that the threat component is a shortcut.
CARO is an organization that consists of individuals across corporate and academic borders, and it is designed to research and study malware. It has been pushing for a naming standard since it was established in 1990.
Malware Attribute Enumeration and Characterization (MAEC) is a community-developed structured language for encoding information about malware based on attributes such as behaviors, artifacts and relationships between malware samples. It can be used for malware characterization that is not based on signatures. MAEC is similar to STIX — if you use STIX or TAXII, MAEC is certainly worth investigating.
The MAEC language is defined by two specification documents:
- The core concepts with high-level use cases and the definition of data types and top-level objects; and
- A vocabularies document with explicit values.
Machine-Parsable Malware Classification
Some threat intelligence sharing platforms, such as the Malware Information Sharing Platform (MISP), support malware classification schemes with machine-parsable tags along with human-readable descriptions. Having both of these classification approaches available at once can help make incident response processes more fluent.
The machine-parsable tags allow analysts to easily include automation steps and protection rules that can be pushed to their security solutions. For example, once a sample has been analyzed and all characteristics of a threat event have been added to the platform, the security team can then deploy intrusion detection system (IDS) rules automatically during the containment and eradication phase. In the meantime, the human-readable tags allow analysts to quickly create summary reports. Having the integration immediately available in a threat information sharing platform makes exchanging this information with peers much easier.
Selecting the Right Malware Classification Approach
There are many different approaches for classifying malware. Choosing the right scheme depends on your specific use case.
If you are interested in spotting relationships and similarities between malware samples, a classification scheme based on image representation and malware clustering techniques is certainly worth investigating. If your goal is to improve your incident response process for dealing with malware outbreaks, the classification should account for the prioritization and urgency criteria.
The capabilities and behavior of the malware will define its impact on your environment. For example, is it designed to steal user credentials, leak sensitive data, allow remote access or sabotage your systems? A higher impact requires a higher priority. You should also consider whether there are protection measures that might hinder or slow down your analysis.
Any platform targeted by the malware can be used as input to determine its priority level. Malware that targets platforms that are not deployed in your environment or that use document formats that are automatically filtered is more of a nuisance than a real threat.
Propagation methods will also help define the urgency of the incident. Malware that can automatically spread without user interaction requires immediate follow-up. Malware that your security solutions have already recognized and filtered, on the other hand, can be classified as less urgent.
If you are just starting with a classification scheme, the malware naming convention based on CARO is a great foundation. You should also prepare playbooks for the most frequent malware types to ensure that your team is not caught off guard when an incident strikes. Regardless of which classification scheme you choose, make sure that you build it in such a way that you can easily include automated steps in your incident response strategy.