Detecting vulnerabilities in code has been a problem facing the software development community for decades. Undetected weaknesses in production code can become attack entry points if detected and exploited by attackers. Such vulnerabilities can greatly damage the reputation of the company releasing the software and, potentially, the operational and financial well-being of the companies that installed the software and suffered from the attack. The magnitude of this problem keeps growing. In 2020, the US-CERT database confirmed 17,447 new vulnerabilities; a record number for the fourth year running.

The software development community has developed a variety of methods for detecting those weaknesses before they get into production code. Every piece of code goes through thorough static scanning, dynamic scanning and penetration testing before it is released into a product. But these scans still suffer from false positives, false negatives and long run times, making the security process a burden on the development team.

Deeper Into Transfer Learning

Recently, extensive research has been conducted on how to leverage artificial intelligence (AI) and deep learning techniques to analyze and generate code. The main challenge in this domain has been figuring out how to leverage and include years of knowledge amassed by code experts into deep learning models. Some of the research approaches the challenge from the data generation point of view to solve the problem of creating and labeling samples; some design specific deep networks to solve the problems that arise from code structure and semantics; while others design feature extraction techniques to solve the problems of parsing code using AI.

Transfer learning is one of the most promising deep learning approaches for leveraging existing expert knowledge. It has demonstrated success in overcoming a lack of samples by using existing pre-trained models for problems in a similar domain. For example, transfer learning for medical imaging leverages pre-trained models for image classification to classify medical images.

The transfer learning approach proves to be successful in this case because the layers of a pre-trained model can extract features of a ‘general’ image, while the transfer layer can make the final classification for the medical image domain. However, in the software development domain, there are no pre-trained models that can successfully extract features from code.

3 Steps to Transfer Learning

To solve the code classification problem developers may run into, we suggest the following three-step transfer learning approach:

  • Leverage an existing code analyzer to create an internal-state representation of the code, parse the code using this tool, run initial analysis and create an internal representation of the code.
  • Use a pre-trained image classification convolutional neural network (CNN) model to extract features from this internal representation and apply transfer learning to it.
  • Use transfer learning to train a classic machine learning model (such as a support vector machine) on existing data.

The image below describes the training process. For each labeled code sample, create an analyzer tool internal state representation of the samples, feed the internal state to a CNN, obtain the penultimate CNN layer output and feed it to a support vector machine (SVM). Then, train the SVM using this input and the original sample’s label.

Figure 1: Three Steps to Transfer Learning

Using this approach, we solved the feature extraction problem by using an existing tool that can parse code, analyze it and create a new representation of it (such as a call graph). This new representation is fed into a pre-trained model that helps solve the data generation problem by leveraging transfer learning techniques.

Test It!

To test the above approach, we used the Juliet data set developed by NIST. This set contains 64K labeled C/C++ code samples. These samples are targeted at specific Common Weakness Enumerations (CWEs), and some are tailored to deceive security scans. We used a state-of-the-art static analysis tool that parses and analyzes C/C++ code for weaknesses to create an internal representation of these code samples. We then fed the internal representation to a MobileNetV2 model (a 53-layer CNN pre-trained for image classification), applied transfer learning by removing the last layer in MobileNetV2 and fed its output to an SVM classifier.

We chose SVM after running a light-weight grid search on several classifiers available in scikit-learn, a free software machine learning library for the Python programming language. We chose MobileNetV2 after running some experiments on pre-trained classifiers available in Keras, an open-source software library that provides a Python interface for artificial neural networks. The analysis tool was configured to its default setting.

We ran classifications using our method and compared them to the analyzer’s prediction results. Our method was more effective in detecting CWEs than the analyzer. It showed a higher f1_score and was able to detect weaknesses that were not detected by the analysis tool.

An example of this is CWE476, a null pointer dereference that is caused by a code split into two functions that are connected by a global variable. CWE476 was not detected by the static analysis tool but was detected by our Transfer Learning approach.

Smarter Testing, Better Code

The effectiveness of this approach comes from leveraging a tool that codifies years of code analysis domain expertise. This tool provided us with great feature extraction capabilities that were coded, for a different purpose, throughout the years. Using Transfer Learning and adding SVM as the last neural network layer allowed us to overcome the lack of labeled data samples and effective data generation techniques for code. Finally, we believe the pre-trained MobileNetV2 model was successful at feature extraction of the intermediate representation of the analysis tool due to the nature of this representation.

In the future, we plan to expand our experiments to real-life examples and research possible enhancements to the approach described above. One of the main challenges is to understand how to choose a better pre-trained model that can replace MobileNetV2 CNN. An interesting direction is to train a model on a helper-problem and then use it to replace the CNN layer.

Hardening Code for Better Security

The number of vulnerabilities has been reaching new record numbers annually for the fourth consecutive year as new security flaws found in code keep seeing attackers exploit them to compromise organizations and data. These are security flaws discovered after the fact, in production code, in firmware and product code, websites and the logic that connects the dots.

Finding flaws in code before it is released is a priority, but it can also delay deadlines and release expectations. In some cases, companies struggle to find the right skillset or fund secure code reviews for the project. Many reasons can work against code becoming more secure, and that is precisely what makes it critical that we find new, smarter ways to enable developers to check for security flaws before code is released.

Using AI to solve business issues can accelerate solutions for all of society. Just as we applied AI to better analyze code, enterprises need AI that is fluid, adaptable and capable of applying knowledge acquired for one purpose to new domains and challenges. They need AI that can combine different forms of knowledge, unpack causal relationships and learn new things on its own. In short, enterprises need AI with fluid intelligence — and that’s exactly what we’re building.

Learn more

This blog is based on a patent filed by IBM in May 2020: “Leveraging AI for vulnerability detection using internal representation of code analysis results, Fady Copty, Shai Doron, and Reda Igbaria.”

More from Artificial Intelligence

Cloud Threat Landscape Report: AI-generated attacks low for the cloud

2 min read - For the last couple of years, a lot of attention has been placed on the evolutionary state of artificial intelligence (AI) technology and its impact on cybersecurity. In many industries, the risks associated with AI-generated attacks are still present and concerning, especially with the global average of data breach costs increasing by 10% from last year.However, according to the most recent Cloud Threat Landscape Report released by IBM’s X-Force team, the near-term threat of an AI-generated attack targeting cloud computing…

Testing the limits of generative AI: How red teaming exposes vulnerabilities in AI models

4 min read - With generative artificial intelligence (gen AI) on the frontlines of information security, red teams play an essential role in identifying vulnerabilities that others can overlook.With the average cost of a data breach reaching an all-time high of $4.88 million in 2024, businesses need to know exactly where their vulnerabilities lie. Given the remarkable pace at which they’re adopting gen AI, there’s a good chance that some of those vulnerabilities lie in AI models themselves — or the data used to…

Security roundup: Top AI stories in 2024

3 min read - 2024 has been a banner year for artificial intelligence (AI). As enterprises ramp up adoption, however, malicious actors have been exploring new ways to compromise systems with intelligent attacks.With the AI landscape rapidly evolving, it's worth looking back before moving forward. Here are our top five AI security stories for 2024.Can you hear me now? Hackers hijack audio with AIAttackers can fake entire conversations using large language models (LLMs), voice cloning and speech-to-text software. This method is relatively easy to…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today