September 21, 2018 By Moazzam Khan 5 min read

The use of AI and machine learning in cybersecurity is on the rise. These technologies can deliver advanced insights that security teams can use to identify threats accurately and in a timely fashion. But these very same systems can sometimes be manipulated by rogue actors using adversarial machine learning to provide inaccurate results, eroding their ability to protect your information assets.

While it’s true that AI can strengthen your security posture, machine learning algorithms are not without blind spots that could be attacked. Just as you would scan your assets for vulnerabilities and apply patches to fix them, you need to constantly monitor your machine learning algorithms and the data that gets fed into them for issues. Otherwise, adversarial data could be used to trick your machine learning tools into allowing malicious actors into your systems.

Most research in the adversarial domain has been focused on image recognition; researchers have been able to create images that fool AI programs but that are recognizable to humans. In the world of cybersecurity, cybercriminals can apply similar principles to malware, network attacks, spam and phishing emails, and other threats.

How Does Adversarial Machine Learning Work?

When building a machine learning algorithm, the aim is to create a perfect model from a sample set. However, the model uses the information in that set to make generalizations about all other samples. This makes the model imperfect and leaves it with blind spots for an adversary to exploit.

Adversarial machine learning is a technique that takes advantage of these blind spots. The attacker provides samples to a trained learning model that cause the model to misidentify the input as belonging to a different class than what it truly belongs to.

What Does the Adversary Know?

The sophistication of an attack and the effort required from the adversaries depends on how much information attackers have about your machine learning system. In a whitebox model, they have information about inputs, outputs and classification algorithms. In a graybox model, the attackers only know the scores that your model produces against inputs. A blackbox model is the hardest to exploit because the attackers only know classifications such as zero/one or malicious/benign.

Types of Adversarial Machine Learning Attacks

There are two primary types of adversarial machine learning attacks: poisoning attacks and evasion attacks. Let’s take a closer look at the similarities and differences between the two.

Poisoning Attacks

This type of attack is more prevalent in online learning models — models that learn as new data comes in, as opposed to those that learn offline from already collected data. In this type of attack, the attacker provides input samples that shift the decision boundary in his or her favor.

For example, consider the following diagram showing a simple model consisting of two parameters, X and Y, that predict if an input sample is malicious or benign. The first figure shows that the model has learned a clear decision boundary between benign (blue) and malicious (red) samples, as indicated by a solid line separating the red and blue samples. The second figure shows that an adversary input some samples that gradually shifted the boundary, as indicated by the dotted lines. This results in the classification of some malicious samples as benign.

Evasion Attacks

In this type of attack, an attacker causes the model to misclassify a sample. Consider a simple machine learning-based intrusion detection system (IDS), as shown in the following figure. This IDS decides if a given sample is an intrusion or normal traffic based on parameters A, B and C. Weights of the parameters (depicted as adjustable via a slider button) determine whether traffic is normal or an intrusion.

If this is a whitebox system, an adversary could probe it to carefully determine the parameter that would classify the traffic as normal and then increase that parameter’s weight. The concept is illustrated in the following figure. The attacker recognized that parameter B plays a role in classifying an intrusion as normal and increased the weight of parameter B to achieve his or her goal.

How to Defend Against Attacks on Machine Learning Systems

There are different approaches for preventing each type of attack. The following best practices can help security teams defend against poisoning attacks:

  • Ensure that you can trust any third parties or vendors involved in training your model or providing samples for training it.
  • If training is done internally, devise a mechanism for inspecting the training data for any contamination.
  • Try to avoid real-time training and instead train offline. This not only gives you the opportunity to vet the data, but also discourages attackers, since it cuts off the immediate feedback they could otherwise use to improve their attacks.
  • Keep a ground truth test and test your model against this set after every training cycle. Considerable changes in classifications from the original set will indicate poisoning.

Defending against evasive attacks is very hard because trained models are imperfect, and an attacker can always find and tune the parameters that will tilt the classifier in the desired direction. Researchers have proposed two defenses for evasive attacks:

  1. Try to train your model with all the possible adversarial examples an attacker could come up with.
  2. Compress the model so it has a very smooth decision surface, resulting in less room for an attacker to manipulate it.

Another effective measure is to use cleverhans, a Python library that benchmarks machine learning systems’ vulnerabilities to adversarial samples. This can help organizations identify the attack surface that their machine learning models are exposing.

According to a Carbon Black report, 70 percent of security practitioners and researchers said they believe attackers are able to bypass machine learning-driven security. To make machine learning-based systems as foolproof as possible, organizations should adopt the security best practices highlighted above.

The truth is that any system can be bypassed, be it machine learning-based or traditional, if proper security measures are not implemented. Organizations have managed to keep their traditional security systems safe against most determined attackers with proper security hygiene. The same focus and concentration is required for machine learning systems. By applying that focus, you’ll be able to reap the benefits of AI and dispel any perceived mistrust toward those systems.

More from Artificial Intelligence

Could a threat actor socially engineer ChatGPT?

3 min read - As the one-year anniversary of ChatGPT approaches, cybersecurity analysts are still exploring their options. One primary goal is to understand how generative AI can help solve security problems while also looking out for ways threat actors can use the technology. There is some thought that AI, specifically large language models (LLMs), will be the equalizer that cybersecurity teams have been looking for: the learning curve is similar for analysts and threat actors, and because generative AI relies on the data…

AI vs. human deceit: Unravelling the new age of phishing tactics

7 min read - Attackers seem to innovate nearly as fast as technology develops. Day by day, both technology and threats surge forward. Now, as we enter the AI era, machines not only mimic human behavior but also permeate nearly every facet of our lives. Yet, despite the mounting anxiety about AI’s implications, the full extent of its potential misuse by attackers is largely unknown. To better understand how attackers can capitalize on generative AI, we conducted a research project that sheds light on…

C-suite weighs in on generative AI and security

3 min read - Generative AI (GenAI) is poised to deliver significant benefits to enterprises and their ability to readily respond to and effectively defend against cyber threats. But AI that is not itself secured may introduce a whole new set of threats to businesses. Today IBM’s Institute for Business Value published “The CEO's guide to generative AI: Cybersecurity," part of a larger series providing guidance for senior leaders planning to adopt generative AI models and tools. The materials highlight key considerations for CEOs…

Does your security program suffer from piecemeal detection and response?

4 min read - Piecemeal Detection and Response (PDR) can manifest in various ways. The most common symptoms of PDR include: Multiple security information and event management (SIEM) tools (e.g., one on-premise and one in the cloud) Spending too much time or energy on integrating detection systems An underperforming security orchestration, automation and response (SOAR) system Only capable of taking automated responses on the endpoint Anomaly detection in silos (e.g., network separate from identity) If any of these symptoms resonate with your organization, it's…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today