Skip to content
Back to articles

OWASP Top 10 for LLMs: AI Security Risks Explained

March 7, 2026ยท8 min readยท1,522 words
AI SecurityOWASPLLMTutorialVideo Summary
OWASP's Top 10 Ways to Attack LLMs โ€” IBM Technology
Image: Screenshot from YouTube.

Key insights

  • Prompt injection remains the #1 risk because AI models still struggle to tell the difference between instructions and user input
  • Sensitive information disclosure jumped four spots to #2. Training data can leak out, and attackers can harvest entire models through repeated queries
  • Every AI system has a supply chain of data, models, and infrastructure. With over 2 million models on Hugging Face, manual inspection is impossible
  • Defenses follow a pattern: filter inputs and outputs with an AI firewall, limit access, and test your system like an attacker would
SourceYouTube
Published March 7, 2026
IBM Technology
IBM Technology
Hosts:Jeff Crume

This article is a summary of OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed. Watch the video โ†’

Read this article in norsk


In Brief

Jeff Crume, Distinguished Engineer at IBM, walks through OWASP's updated Top 10 security risks for Large Language Models (LLMs), the AI systems behind tools like ChatGPT and Claude. The 2025 list reflects what teams have learned since the first version in 2023. Prompt injection is still the number one threat, sensitive data leaks have become a much bigger problem, and new risks like "denial of wallet" show how AI attacks can directly cost you money.

10
security risks ranked
2M+
AI models on Hugging Face
#1
prompt injection, still unsolved

What you'll learn

  • The 10 most common ways attackers target AI systems, ranked by real-world impact
  • Why prompt injection is so hard to fix, and how indirect attacks work through documents
  • Practical defenses you can apply today: AI firewalls, access controls, and pen testing

The big four: risks that matter most

The first four items on the list account for the most damaging real-world attacks. Here's what each one means and how to defend against it.

1. Prompt injection (unchanged from 2023)

This is still the number one risk because AI models struggle to tell the difference between instructions and user input (2:33). An attacker can type something that overrides the system's built-in rules.

There are two types. Direct injection is when an attacker types a malicious prompt straight into the system. The classic example: asking a chemistry question that tricks the AI into explaining how to make something dangerous (1:51).

Indirect injection is sneakier. A user asks the AI to summarize a document, but the document itself contains hidden instructions like "forget all previous rules" (3:05). The AI follows those instructions without the user even knowing.

Researchers have found that rephrasing prompts as poems or Morse code can bypass protections that work against normal language (4:17).

Explained simply:

Explained simply: Think of an AI's system prompt like a bouncer at a club with a guest list. Direct injection is someone talking their way past the bouncer. Indirect injection is someone hiding inside a delivery box. The bouncer checks the guest list, but the rules written on the list can't cover every creative trick. That's why this problem is so hard to solve completely.

Defenses: Strengthen system prompts, put an AI firewall (a filter that checks inputs and outputs) between users and the model, and run penetration tests (deliberate attacks on your own system to find weaknesses) (5:51).

2. Sensitive information disclosure (up 4 spots)

This has become a much bigger problem than expected (7:02). If an AI was trained on customer data, health records, or financial information, a clever prompt can make it leak that data back out.

There's also something called a model inversion attack. An attacker sends thousands of queries and records the responses, gradually extracting the model's training data. It's like photocopying a book one page at a time (8:49).

Defenses: Filter sensitive data before it enters the model, use an AI firewall on outputs to catch leaking credit card numbers or personal information, and lock down who can access the model (9:19).

3. Supply chain vulnerabilities (up 2 spots)

An AI system doesn't exist in a vacuum. It needs data, a base model, applications, and infrastructure. Most organizations don't build their own models. They download them from places like Hugging Face, which has over 2 million AI models, many with more than a billion parameters (12:37). That's far too much for anyone to inspect manually.

Analogy:

Explained simply: Imagine buying ingredients from a market where you can't check the kitchen. The flour might be fine, or someone might have mixed something into it. With 2 million options on the shelf, you can't taste-test everything before you cook. That's what downloading an unverified AI model is like. Unlike real food, though, a "contaminated" model can silently affect millions of decisions before anyone notices.

Defenses: Verify your sources, trace provenance (where things came from and who touched them along the way), scan models for vulnerabilities, and keep all software up to date (13:37).

4. Data and model poisoning (down 1 spot)

If the data used to train an AI contains errors or has been tampered with, everything downstream is affected. As Crume puts it: "Just a little bit of toxin in the drinking water makes us all sick" (15:54).

This also applies to retrieval-augmented generation (RAG), a technique where you give the AI specific documents to reason over. If those documents have been manipulated, the AI's answers will be wrong too (17:16).

Defenses: Know where your data comes from, control who can change models and training data, and put change management in place (18:14).


The remaining six

The video covers these more briefly, but each one matters.

5. Improper output handling. If your AI writes code or generates content that goes into a browser, it could introduce vulnerabilities like cross-site scripting (XSS) or SQL injection. Never trust AI output blindly (19:14).

6. Excessive agency. An AI connected to tools, APIs, and real-world systems has real power. If it gets hijacked or hallucinates, it could take actions with serious consequences (20:16).

7. System prompt leakage. The system prompt sets the AI's rules. If it contains credentials or API keys, a carefully worded question could make the AI reveal them (21:25).

8. Vector and embedding weaknesses. Manipulated RAG documents can corrupt the AI's knowledge over time, making it unreliable (22:17).

9. Misinformation. AI models hallucinate. They make up facts that sound convincing. Critical thinking and cross-referencing are essential (23:01).

10. Unbounded consumption. Overloading an AI system with too many or too complex requests can take it offline. This is known as "denial of wallet" because it costs real money (24:23).


Checklist: Securing your AI system

The same defensive strategies come up throughout the list. Here's how to apply them:

1

Put an AI firewall between users and the model

Filter both inputs and outputs. Catch malicious prompts before they reach the model, and block sensitive data from leaking out in responses.

2

Lock down access

Control who can query the model, who can change training data, and who can modify the system. Not everyone needs the same level of access.

3

Test like an attacker

Run penetration tests with prompt injections, adversarial inputs, and edge cases. If you don't find the weaknesses, someone else will.

4

Verify your supply chain

Know where your data, models, and infrastructure come from. Check provenance, scan for vulnerabilities, and keep everything up to date.

5

Validate AI output before using it

Don't trust what the AI produces blindly. Check generated code for vulnerabilities, cross-reference facts, and review before passing output to other systems.


Practical implications

For developers building AI features

Start with the top four. If you're using an LLM in production, make sure you have input/output filtering and access controls. These cover the biggest risks.

For teams evaluating AI tools

Ask your vendors about their security posture. Do they filter prompts? How do they handle sensitive data in training? What supply chain verification do they do?

For anyone using AI daily

Be a critical consumer. AI can hallucinate, leak information, and be manipulated. Cross-check important answers against other sources.


Test yourself

  1. Trade-off: An AI firewall adds latency to every request. When is the performance cost worth it, and when might you accept the risk of running without one?
  2. Transfer: How would you apply the concept of supply chain verification to a non-AI system, like a web application that uses open-source libraries?
  3. Architecture: Design an access control system for an AI application that needs to process sensitive customer data while preventing model inversion attacks. What layers would you add?

Glossary

TermDefinition
OWASPOpen Worldwide Application Security Project. A nonprofit that publishes practical security guidance, including the well-known Top 10 lists.
Prompt injectionTricking an AI by sneaking instructions into a prompt that override its built-in rules.
System promptHidden instructions that set the AI's behavior, like "be helpful" or "don't share personal data."
AI firewall / AI gatewayA filter between users and the AI that checks both inputs and outputs for suspicious content.
Model inversion attackExtracting an AI's training data by sending thousands of queries and piecing the responses together.
RAG (Retrieval-Augmented Generation)Giving an AI specific documents to reason over, reducing made-up answers. The documents are retrieved and used to "augment" the AI's response.
Supply chain (AI)Everything that goes into building an AI system: training data, base models, software, and infrastructure.
Data poisoningIntroducing bad data into AI training to corrupt the model's outputs. Subtle changes can go undetected for a long time.
Excessive agencyWhen an AI has too many permissions and can take real-world actions, like calling APIs or modifying systems.
Denial of walletAn attack that overloads an AI system, making it unavailable and wasting the money spent running it.
Penetration testingDeliberately attacking your own system to find weaknesses before real attackers do.
HallucinationWhen an AI confidently generates false information. It's not lying on purpose. It's predicting what sounds right and getting it wrong.
Hugging FaceA platform for sharing AI models, like GitHub for machine learning. Hosts over 2 million models.

Sources and resources