My IBM

Log in

Subscribe

How AI can be hacked with prompt injection: NIST report

Authors

Ronda Swaney

Freelance Technology Writer

The National Institute of Standards and Technology (NIST) closely observes the AI lifecycle, and for good reason. As AI proliferates, so does the discovery and exploitation of AI cybersecurity vulnerabilities. Prompt injection is one such vulnerability that specifically attacks generative AI.

In Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations, NIST defines various adversarial machine learning (AML) tactics and cyberattacks, like prompt injection, and advises users on how to mitigate and manage them. AML tactics extract information about how machine learning (ML) systems behave to discover how they can be manipulated. That information is used to attack AI and its large language models (LLMs) to circumvent security, bypass safeguards and open paths to exploit.

What is prompt injection?

NIST defines two prompt injection attack types: direct and indirect. With direct prompt injection, a user enters a text prompt that causes the LLM to perform unintended or unauthorized actions. An indirect prompt injection is when an attacker poisons or degrades the data that an LLM draws from.

One of the best-known direct prompt injection methods is DAN, Do Anything Now, a prompt injection used against ChatGPT. DAN uses roleplay to circumvent moderation filters. In its first iteration, prompts instructed ChatGPT that it was now DAN. DAN could do anything it wanted and should pretend, for example, to help a nefarious person create and detonate explosives. This tactic evaded the filters that prevented it from providing criminal or harmful information by following a roleplay scenario. OpenAI, the developers of ChatGPT, track this tactic and update the model to prevent its use, but users keep circumventing filters to the point that the method has evolved to (at least) DAN 12.0.

Indirect prompt injection, as NIST notes, depends on an attacker being able to provide sources that a generative AI model would ingest, like a PDF, document, web page or even audio files used to generate fake voices. Indirect prompt injection is widely believed to be generative AI’s greatest security flaw, without simple ways to find and fix these attacks. Examples of this prompt type are wide and varied. They range from absurd (getting a chatbot to respond using “pirate talk”) to damaging (using socially engineered chat to convince a user to reveal credit card and other personal data) to wide-ranging (hijacking AI assistants to send scam emails to your entire contact list).

Explore AI cybersecurity solutions

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

How to stop prompt injection attacks

These attacks tend to be well hidden, which makes them both effective and hard to stop. How do you protect against direct prompt injection? As NIST notes, you can’t stop them completely, but defensive strategies add some measure of protection. For model creators, NIST suggests ensuring training datasets are carefully curated. They also suggest training the model on what types of inputs signal a prompt injection attempt and training on how to identify adversarial prompts.

For indirect prompt injection, NIST suggests human involvement to fine-tune models, known as reinforcement learning from human feedback (RLHF). RLHF helps models align better with human values that prevent unwanted behaviors. Another suggestion is to filter out instructions from retrieved inputs, which can prevent executing unwanted instructions from outside sources. NIST further suggests using LLM moderators to help detect attacks that don’t rely on retrieved sources to execute. Finally, NIST proposes interpretability-based solutions. That means that the prediction trajectory of the model that recognizes anomalous inputs can be used to detect and then stop anomalous inputs.

Generative AI and those who wish to exploit its vulnerabilities will continue to alter the cybersecurity landscape. But that same transformative power can also deliver solutions. Learn more about how IBM Security delivers AI cybersecurity solutions that strengthen security defenses.

Mixture of Experts | 25 July, episode 65

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch the latest podcast episodes

Unlock the power of generative AI and ML

Learn how to confidently incorporate generative AI and machine learning into your business.

Resources

Prompt Engineering with watsonx.ai

Gain a comprehensive understanding of prompt engineering, learn techniques to achieve the best results with LLMs, and apply learnings through completion of a diverse set of prompt engineering exercises.

Locally differentially private document generation using zero shot prompting

See how this publication demonstrates that pretrained large language models can effectively contribute to privacy preservation.

Adversarial prompting - Testing and strengthening the security and safety of LLMs

Explore various techniques involved in adversarial prompting, focusing on how these tactics can be used to test and strengthen the security and safety of large language models.

watsonx Developer Hub

Get hands-on with prompt engineering on the watsonx Developer Hub.

Prompt engineering fundamentals

Go from zero to hero with prompt templates for different types of prompts.

Prompt engineering for everyone

Master the language of AI and unleash its full potential with our free prompt engineering course.

Get started with generative AI

Get started with generative AI using curated tools, frameworks, tutorials, labs and community support.

Related solutions

Related solutions

Foundation models

Explore Granite 3.2 and the IBM library of foundation models in the watsonx portfolio to scale generative AI for your business with confidence.

Explore watsonx.ai

AI for developers

Move your applications from prototype to production with the help of our AI development solutions.

Explore AI development solutions

AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services

Take the next step

Explore the IBM library of foundation models in the IBM watsonx portfolio to scale generative AI for your business with confidence.

Explore watsonx.ai

Explore AI development tools