Mapping attacks on generative AI to business impact

Authors

Sam Hector

Global Strategy Leader

IBM Security

In recent months, we’ve seen government and business leaders put an increased focus on securing AI models. If generative AI is the next big platform to transform the services and functions on which society as a whole depends, ensuring that technology is trusted and secure must be businesses’ top priority. While generative AI adoption is in its nascent stages, we must establish effective strategies to secure it from the onset.

The IBM Institute for Business Value found that despite 64% of CEOs facing significant pressure from investors, creditors and lenders to accelerate the adoption of generative AI, 60% are not yet developing a consistent, enterprise-wide approach to generative AI. In fact, 84% are concerned about widespread or catastrophic cybersecurity attacks that generative AI adoption could lead to.

As organizations determine how to best incorporate generative AI into their business models and assess the security risks that the technology could introduce, it’s worth examining top attacks that threat actors could execute against AI models. While only a small number of real-world attacks on AI have been reported, IBM X-Force Red has been testing models to determine the types of attacks that are most likely to appear in the wild. To understand the potential risks associated with generative AI that organizations need to mitigate as they adopt the technology, this blog will outline some of the attacks adversaries are likely to pursue, including prompt injection, data poisoning, model evasion, model extraction, inversion and supply chain attacks.

Prompt injection

Prompt injection attacks manipulate Large Language Models (LLMs) by crafting malicious inputs that seek to override the system prompt (initial instructions for the AI provided by the developer). This can result in jailbreaking a model to perform unintended actions, circumventing content policies to generate misleading or harmful responses, or revealing sensitive information.

LLMs are biased in favor of obeying the user and are susceptible to the same trickery as humans, akin to social engineering. Hence, it’s trivially easy to circumnavigate content filters in place, often as easy as asking the LLM to “pretend it’s a character,” or to “play a game.” This attack can result in reputational damage, through the generation of harmful content; service degradation, by crafting prompts that trigger excessive resource utilization; and intellectual property or data theft, through revealing a confidential system prompt.

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

Data poisoning

Data poisoning attacks consist of adversaries tampering with data used to train the AI models to introduce vulnerabilities, biases, or change the model’s behavior. This can potentially compromise the model’s effectiveness, security or integrity. Assuming models are being trained on closed data sets, this requires a high level of access to the data pipeline, either via access from a malicious insider, or sophisticated privilege escalation through alternative means. However, models trained on open-source data sets would be an easier target for data poisoning as attackers have more direct access to the public source.

The impact of this attack could range anywhere from misinformation attempts to Die Hard 4.0, depending on the threat actor’s objective, fundamentally compromising the integrity and effectiveness of a model.

Mixture of Experts | 4 July, episode 62

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch the latest podcast episodes

Model evasion

A model evasion attack would allow attackers to modify inputs into the AI model in a way that causes it to misclassify or misinterpret them, changing its intended behavior. This can be done visibly to a human observer (e.g., putting small stickers on stop signs to cause self-driving cars to ignore them) or invisibly (e.g., changing individual pixels in an image by adding noise that tricks an object recognition model).

Depending on the complexity of the AI model, this attack could vary in intricacy and executability. What’s the format and size of the model’s inputs and outputs? Does the attacker have unrestricted access to them? Depending on the purpose of the AI system, a successful model evasion attack could have a significant impact on the business. For example, if the model is being used for security purposes, or to make decisions of significance like loan approvals, evasion of intended behavior could cause significant damage.

However, given the variables here, attackers opting for the path of least resistance are unlikely to use this tactic to advance their malicious objective.

Model extraction

Model extraction attacks aim at stealing the intellectual property (IP) and behavior of an AI model. They’re performed by querying it extensively and monitoring the inputs and outputs to understand its structure and decisions, before attempting to replicate it. These attacks, however, require extensive resources and knowledge to execute, and as the AI model’s complexity increases, so does the level of difficulty to execute this attack.

While the loss of IP could have significant competitive implications, if attackers have the skills and resources to perform model extraction and replication successfully, it’s likely easier for them to simply download an open-source model and customize it to behave similarly. Besides, techniques like strict access controls, monitoring and rate limiting significantly hamper adversarial attempts without direct access to the model.

Inversion attacks

Whereas extraction attacks aim to steal the model behavior itself, inversion attacks aim to find out information on the training data of a model, despite only having access to the model and its outputs. Model inversion allows an attacker to reconstruct the data a model has been trained on, and membership inference attacks can determine whether specific data was used in training the model.

The complexity of the model and the extent of information output from it would influence the level of complexity in executing such an attack. For example, some inference attacks exploit the fact a model outputs a confidence value as well as a result. In this case, attackers can attempt to reconstruct an input that maximizes the returned confidence value. That said, attackers are unlikely to have the unrestricted access required to a model or its outputs to make this practical in the wild. However, the potential for data leakage and privacy violations carries risks.

Supply chain attacks

AI models are more integrated into business processes, SaaS apps, plugins and APIs than ever before, and attackers can target vulnerabilities in these connected services to compromise the behavior or functionality of the models. Plus, businesses are utilizing freely available models from repositories like Hugging Face to get a head-start on AI development, which could embed malicious functionality like trojans and backdoors.

Successful exploitation of connected integrations requires extensive knowledge of the architecture, and often exploitation of multiple vulnerabilities. Although these attacks would require a high level of sophistication, they are also difficult to detect and could have a wide impact on organizations lacking an effective detection and response strategy.

Given the interconnected nature of AI systems and increasing their involvement in critical business processes, safeguarding against supply chain attacks should be a high priority. Vetting third-party components, monitoring for vulnerabilities and anomalies, and implementing DevSecOps best practices are crucial.

Securing AI

IBM recently introduced the IBM Framework for Securing AI — helping customers, partners and organizations around the world better prioritize the defensive approaches that are most important to secure their generative AI initiatives against anticipated attacks. The more organizations understand what type of attacks are possible against AI, the more they can enhance their cyber preparedness by building effective defense strategies. And while it will require time for cyber criminals to invest in the resources necessary to attack AI models at scale, security teams have a rare time advantage — an opportunity to secure AI, before attackers place the technology at the center of their target scope. No organization is exempt from the need to establish a strategy for securing AI. This includes both models they’re actively investing in to optimize their business and tools introduced as shadow AI by employees seeking to enhance their productivity.

If you want to learn more about securing AI, and how AI can enhance the time and talent of your security teams, read our authoritative guide.