How to Secure AI Agents: IBM and Anthropic's Guide

Key insights

AI agents must be secured, governed, and audited. These three pillars form the foundation of IBM and Anthropic's enterprise security framework
Prompt injection is the number one attack type against large language models, and agents amplify the damage because they operate autonomously at speed
Agents need nonhuman identities with unique credentials, just-in-time access, and role-based access control, just like human users

SourceYouTube

Published February 19, 2026

IBM Technology

Hosts:Jeff Crume

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

IBM and Anthropic have released a joint guide on how to architect secure enterprise AI agents using the Model Context Protocol (MCP, a protocol that lets agents communicate with tools and services). Jeff Crume, IBM Distinguished Engineer and CTO of IBM Security Americas, walks through the seven biggest security threats agents face, the design principles that counter them, and a layered security framework covering identity management, AI firewalls, and continuous monitoring. The core message: agents must be secured, governed, and audited from day one.

What is an AI agent?

An AI agent is a system that can perceive context, reason over goals, and take actions through tools and services. Crume describes them as "models using tools in a loop". What makes agents powerful is their ability to operate autonomously, without human intervention. You tell the agent what you want done, and it figures out the details.

But with that autonomy comes risk. Agents need to operate within explicit boundaries, provide observable traces of their decisions, and remain compliant with organizational policies.

The fundamental shift

Crume highlights that agents represent a fundamental shift in how software works:

Deterministic to probabilistic: Traditional software always gives the same output for the same input. Agents make dynamic decisions based on probabilities, so identical inputs can produce different outcomes
Static to adaptive: Agents learn over time. They evolve their behavior based on interaction and human feedback
Code-first to evaluation-first: The focus shifts from writing implementation code to measuring outcomes and checking whether those outcomes move toward the stated goal

Three pillars: secured, governed, audited

Crume argues that every AI agent must meet three requirements: secured, governed, and audited:

Secured: The agent must not leak data or get hijacked by an attacker
Governed: The agent must be reliable and operate within the context you expect
Audited: The agent must comply with organizational policies and regulatory requirements

These three pillars shape every layer of the security framework discussed in the video.

The agent development lifecycle

Before diving into threats and defenses, Crume outlines a structured lifecycle for building and managing agents. This is not a one-time process. It loops continuously.

Step 1 — Plan

Define what the agent should do, what boundaries it needs, and what risks it introduces.

Step 2 — Code

Build the agent, integrating security from the start (not bolted on after).

Step 3 — Test

Verify that the agent behaves within expected boundaries. This completes the "build" phase.

Step 4 — Debug

Identify and fix issues found during testing.

Step 5 — Deploy

Move the agent to production with proper access controls and monitoring.

Step 6 — Monitor

Watch the agent in production. Detect drift, abnormal behavior, and access pattern changes. Then loop back to planning.

This follows a DevSecOps approach (Development + Security + Operations), where security is embedded throughout the entire lifecycle, not just at the end. In traditional DevOps, developers and operations teams collaborate. DevSecOps adds security to every stage.

Seven security threats

Crume walks through seven threat categories that AI agents face:

1. Expanded attack surface

Every new technology expands the attack surface (all the points where an attacker could try to get in). Agents expand it in two directions: the AI model itself, and the MCP protocol that connects agents to tools and services.

2. Excessive agency

The agent has more access and control than it actually needs. If an agent only needs to read a database, it should not have write access.

3. Privilege escalation

The agent takes it upon itself to escalate its own privileges. This means the agent expands its own access rights without authorization, potentially gaining control over systems it was never meant to touch.

4. Data leaks

The agent exposes sensitive data it has access to, either by sending it to the wrong place or by including it in responses that reach unintended audiences.

5. Prompt injection

Crume calls prompt injection the number one attack type against Large Language Models (LLMs). Prompt injection is when someone injects commands into the system to take remote control of it. For example, an attacker could embed instructions in a document the agent reads, causing the agent to follow those instructions instead of yours.

6. Attack amplification

Because agents operate autonomously and at machine speed, a compromised agent amplifies damage far beyond what a compromised human account could do. It doesn't need breaks, doesn't hesitate, and can execute thousands of harmful actions before anyone notices.

7. Compliance drift

Over time, the agent's behavior may drift out of compliance with organizational policies and regulations. This can happen gradually as the system evolves or as external regulations change.

System controls and design principles

System controls

Crume describes three types of controls that agents need:

Constrained operation: Keep the agent tightly controlled, operating only within expected boundaries
Role-Based Access Control (RBAC): Assign roles to agents, just like you would with human users. Crume also interprets RBAC as "risk-based access control," where the amount of access matches the risk level of the task
Sandboxing: Have the agent operate in an isolated environment (a sandbox, a restricted space where the agent cannot affect systems outside it). If something goes wrong inside the sandbox, the damage stays contained

Design principles

Nine principles guide how to put these controls into practice:

Acceptable agency: Define what the agent is allowed to do and what it is not
Interoperability: The agent must work with many tools, but you need to understand what those tools do and what downstream risks they create
Secure by design: Security built in from the start, not added later. Crume stresses that bolting security on after the fact does not work well
Business alignment: The agent must meet business objectives and align with organizational goals
Risk mitigation: Minimize the new risk that the agent introduces
Continuous observation: Monitor the agent's reasoning and actions at all times, because it operates autonomously
Key Performance Indicators (KPIs): Track measurable outcomes the business defines to verify the agent performs as expected
Least privilege: The agent gets access to only what it needs to do its job and nothing more. The instant it no longer needs access, that access is removed
Human in the loop: Maintain human oversight for critical decisions

The security framework

Crume presents a layered security framework that puts these principles into action. It covers three areas: identity management, data and model protection, and threat detection.

Layer 1: Identity and access management

Agents need identities, just like people do. Crume covers four components:

Nonhuman identities: Agents must have their own unique credentials, separate from human users and from each other. If something goes wrong, you need to trace it back to which agent misbehaved. Just as users should not share passwords, agents should not share credentials
Just-in-time access: Give the agent permission to do what it needs right now, then revoke that access when the task is done. This can be time-based: a few minutes, a few hours, or a day
Role-Based Access Control (RBAC): Assign roles to agents the same way you assign roles to employees
Auditing: Record everything the agent does so you can review it later and verify that policies were followed

Layer 2: Data and model protection

This layer focuses on what sits between users and the AI system. Crume recommends an AI firewall that inspects all traffic to and from the AI model.

Step 1 — Intercept incoming requests

Instead of letting users or other systems talk directly to the AI, route everything through an AI firewall. The firewall examines requests for prompt injections and other attacks before they reach the model.

Step 2 — Inspect MCP calls

When the agent talks to external tools via MCP, those calls also pass through the firewall. This catches data flowing out of the system that should not leave.

Step 3 — Apply data loss prevention

The firewall monitors outbound data for sensitive information. If an agent attempts to send customer records, API keys, or other confidential data through an MCP call, the firewall blocks it.

Layer 3: Threat detection and monitoring

Even with strong access controls and a firewall, things can still go wrong. Crume's third layer is about catching problems after they happen:

Real-time monitoring: Watch what agents do, what tools they call, and what effects their actions have. Set alarms for abnormal behavior: too much data being downloaded, access to unexpected systems, or configuration changes
Threat hunting: This is the proactive counterpart to monitoring. Instead of waiting for an alarm, you imagine hypothetical attacks and go looking for signs of them
Risk assessment: Evaluate what risks the agent system exposes you to. Understand what the agent can do, where its limitations are, and where it might go beyond those limits

Crume also highlights three specific things to monitor over time:

Configuration drift: Agents performing operations on their own system may change parameters unexpectedly
Model drift: The AI model's behavior can shift over time, producing different outputs than it did originally
Access pattern analysis: Track what the agent is doing and whether its actions match expected patterns

Checklist: Common pitfalls when securing agents

Are your agents sharing credentials? Each agent needs its own unique identity. Shared credentials make it impossible to trace which agent caused a problem
Does the agent have more access than it needs? Apply the principle of least privilege. Review what roles and permissions are assigned and remove anything unnecessary
Is security added after the fact? Crume is clear that security bolted on later does not work well. Build it in from the planning stage
Are you monitoring MCP calls? The connection between agents and external tools is a major attack vector. Route MCP traffic through an AI firewall
Is there a human in the loop for critical actions? Full autonomy sounds efficient, but it removes the safety net. Keep human oversight for high-risk decisions
Are you watching for drift? Both configuration drift and model drift can silently degrade your security posture. Set up automated checks
Do you have just-in-time access policies? Permanent access is a risk. Give agents time-limited permissions that expire when the task is done

Remember:

Remember: Perfect security does not exist. The goal is to reduce risk to an acceptable level through layers of defense. Even well-designed systems will need ongoing monitoring and adjustment.

Practical implications

For beginners exploring AI agents

Start with the three pillars: secured, governed, audited. Before deploying any agent, even a simple one, ask yourself these questions: what data can it access? What actions can it take? Who reviews what it does? Even a chatbot with access to a customer database needs access boundaries.

For teams building production agents

Follow the DevSecOps lifecycle. Integrate security into planning, not just deployment. Implement an AI firewall or gateway to inspect both inbound requests and outbound MCP calls. Assign nonhuman identities to each agent with just-in-time access rather than permanent credentials. Crume's framework gives a concrete starting point for security architecture reviews.

For organizations managing compliance

The audit pillar is critical. Regulatory requirements around AI are evolving rapidly. Build logging and tracing into your agent infrastructure from day one so you can show compliance. Monitor for configuration drift and model drift, which can silently push a compliant system out of bounds.

Test yourself

Transfer: The IBM/Anthropic framework was designed for enterprise environments. How would you adapt these principles for a solo developer building a personal AI agent that manages their email and calendar?
Trade-off: Crume recommends human-in-the-loop oversight for agents. At what point does human oversight become a bottleneck that removes the benefit of using an agent in the first place? How would you decide which decisions need a human?
Architecture: Design a layered security setup for an AI agent that processes medical records. Which of Crume's seven threats would be your top three priorities, and why?
Behavior: If organizations implement strict RBAC and least-privilege policies for agents, how might this change the way developers build and test their agents during development?
Trade-off: Just-in-time access sounds ideal in theory. What are the practical challenges of implementing it when agents need to respond to requests in milliseconds?

Glossary

Term	Definition
AI agent	A system that perceives context, reasons over goals, and takes actions through tools. Think of it as a smart assistant that can independently decide how to complete a task.
AI firewall / gateway	A proxy that inspects all traffic going to and from an AI model, checking for attacks and data leaks. Like airport security screening both arrivals and departures.
Attack surface	All the points where an attacker could try to break into a system. Adding new tools and connections makes the attack surface bigger.
Compliance drift	When a system gradually moves out of alignment with regulations or policies, often without anyone noticing until an audit.
Configuration drift	When system settings change unexpectedly over time, either through agent actions or environmental changes.
Data loss prevention (DLP)	Monitoring and blocking sensitive information from leaving a system. The digital equivalent of checking that no confidential documents leave the building.
DevSecOps	A development approach where security is integrated throughout the entire lifecycle (development, security, and operations), not bolted on at the end.
Just-in-time access	Temporary access given only when needed and revoked immediately after. Like a hotel key card that expires at checkout.
LLM (Large Language Model)	An AI model trained on large amounts of text that can understand and generate human language. Examples: GPT, Claude, Llama.
MCP (Model Context Protocol)	A protocol that allows AI agents to communicate with external tools and services in a standardized way.
Model drift	When an AI model's behavior changes over time, producing different outputs than it did originally.
Nonhuman identity	A unique set of credentials assigned to an agent, separate from any human user. Ensures each agent can be individually tracked and audited.
Principle of least privilege	A security concept where a system or user gets access only to what they need to do their job and nothing more.
Privilege escalation	When a system expands its own access rights without authorization, potentially gaining control over resources it was never meant to access.
Prompt injection	An attack where someone embeds hidden instructions in input to take control of an AI model's behavior. The most common attack type against LLMs.
RBAC (Role-Based Access Control)	A system where access permissions are assigned to roles rather than individuals. An agent assigned the "reader" role can only read, not write or delete.
Sandbox	An isolated environment that limits what an agent can do. If something goes wrong inside the sandbox, the damage cannot spread to other systems.