Anthropic gives AI agents memory and dreaming

Published May 8, 2026

Anthropic

Hosts:Mahesh Murag

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Most AI agents start with a blank slate every time. They solve the task, close the session, and forget everything they picked up along the way. At Code with Claude, Anthropic's developer conference, product manager Mahesh Murag explained how the company plans to change that. He calls memory the next primitive, the next big building block that will turn agents into more than single-use helpers.

Memory as the next building block

Murag works on the platform team at Anthropic and has been central to several of the company's most important primitives. He built MCP (Model Context Protocol), the standard for how agents connect to tools and data. He was also a key contributor to Skills, packages of instructions an agent can pick up as it works. Now he argues that memory is the missing piece for real self-learning.

His argument is simple. Models keep getting better. Agents solve increasingly complex tasks and can run for hours or days at a time. Yet they almost always start with a blank slate. They do not learn from the last time they did nearly the same job.

With memory, agents can hold on to what they figure out along the way: what often goes wrong, which strategies actually work, how a codebase fits together, and what other agents in the same environment have already learned. It is this last possibility Murag is most excited about: a swarm of agents building a shared understanding of their environment over time.

Why model memory as a file system

Anthropic launched memory for Claude Managed Agents in public beta on April 23, 2026. The interesting design choice is not that they offer memory, but how they model it.

Earlier attempts have been fairly rigid. An agent could store short notes in a special memory file with fixed fields and values. Anthropic takes the opposite approach. They let Claude manage memory as a file system. That means folders and text files the agent reads and writes with familiar tools like bash and grep, the same tools it already uses when working on code.

The reasoning is the same as with Skills. If Claude can navigate a virtual workspace and manage its own files, why force memory into a special form? Let the model decide how many files to split memory into, what structure fits, and what is worth remembering.

Anthropic's latest large model, Claude Opus 4.7, is state of the art at exactly this according to the company itself. It judges well what is worth saving, how memory should be structured, and how many files it makes sense to split it into.

For a developer, it is like giving the agent a notebook instead of a fill-in form. The notebook can be organised to fit the job: chapters, an index, summaries, whatever the agent finds useful.

At larger companies, including Anthropic itself, it is not one or two agents running at a time. It is hundreds or thousands in parallel on the same kinds of tasks. If all of them are going to write to the same memory, the system has to handle three things.

Permission scopes. An agent can have read-only access to one memory store and full read-write access to another. That is useful in practice: the organisation's own runbooks, guidelines, and policies should be read-only. What the agent learns during its own task should be writable, but kept separate.

Concurrency control. If two agents try to write to the same file at once, the system has to prevent one from overwriting the other unknowingly. Anthropic solves this with what they call optimistic concurrency control. The agent reads the file, computes a checksum of the content, and tries to write, but only if the checksum still matches. If someone else changed the file in the meantime, the agent has to read it again and try once more.

Version history. Every change to memory gets a timestamp, an agent ID, and a version. That means a developer can go back in time and see exactly what changed, by whom, and when. Anthropic can also give the agent access to that log, so it can understand its own history.

One small detail says a lot about the design thinking: memory is available through a standalone API outside the agent system itself. That means a business can build its own pipelines around memory, for example to scan for sensitive data, tidy up periodically, or copy it into other systems. Anthropic does not want to lock customers in.

Dreaming: tidying up while you sleep

The new feature Murag presented is called Dreaming and went into research preview (an early test phase) on the same day as the talk. The name is chosen with care.

Our brains tidy up during sleep. They find patterns, drop the irrelevant, and keep what holds value over time. Dreaming does something similar for AI agent memory. It is a background job that reads a batch of recent agent sessions and the existing memory, looks for recurring patterns and mistakes, and produces a new, tidied memory store.

The result can be three kinds of changes. Dreaming removes duplicates when five different agents have written down roughly the same thing. It removes stale information that is no longer accurate. And it surfaces new patterns no single agent had enough perspective to catch on its own. For example, that several agents were consistently triggered 60 seconds after a specific event, which points to a fault in the logic that retries the task.

Three design choices stand out.

First, Dreaming runs outside the agent's normal work path. That means it never slows down an in-progress task. You can run it overnight, or when a task wraps up.

Second, it lets Anthropic separate two different objectives. Regular agents get the job done. Dreaming agents have one job: keep memory tidy and current. Murag thinks memory quality is going to matter more and more, so there is an advantage to making it its own clearly defined goal.

Third, it gives Anthropic a way to scale memory. It is the same thinking that sits behind reasoning models (models that think through a problem in several steps before they answer), where you spend extra time and tokens on thinking. Here you spend extra time and tokens on keeping memory in good shape, so that every later task gets cheaper and better.

Put another way: Dreaming builds an up-to-date index that many agents can look things up in afterwards. You pay the cost once and pull the value out many times.

Results so far

But does it work? Anthropic has tested with a handful of customers, and two concrete numbers are public.

The Japanese commerce and finance group Rakuten uses memory for its internal knowledge agents. According to Anthropic's official blog post, it produced 97 percent fewer first-pass errors, 27 percent lower cost, and 34 percent lower latency. Murag cited 90 percent in the talk, but Rakuten's own quote carries the higher number.

The US legal AI company Harvey tested Dreaming on one of its own legal scenarios. They saw tasks completed successfully six times more often than before.

The numbers are fresh and come from selected test cases, so they should be read as a signal, not a final verdict. But they are big enough to explain why Anthropic believes memory is the next primitive.

What this means for developers and businesses

For a developer building on Anthropic's platform, this means two things in practice.

First, you get a ready-made memory system. You do not have to build it from scratch, and you can set it up with different permission scopes per agent. One store for organisational knowledge that everyone reads from, and one store for working memory per task or team.

Second, you get Dreaming as an optional, scheduled job. You can run it overnight on the previous day's sessions, or whenever a task wraps up. Out comes a new memory store you can review before putting it to use.

For a business looking at AI agents more broadly, this is the interesting part: agents start to look like a resource that gets better over time, not a one-shot commodity. That assumes memory is maintained and that sensitive data does not end up where it should not. That is why both the API and the open format matter.

As Murag closes: agents that run for days or many, many hours at a time are quickly becoming routine. When they do, it will be routine for them to remember.

Glossary

Term	Definition
Memory primitive	A fundamental building block that lets an AI agent remember things between tasks, the way you remember what happened yesterday.
Memory store	A "cupboard" of text files an agent can read and write across sessions.
Claude Managed Agents	Anthropic's ready-made platform for running AI agents, like renting a complete workshop instead of cobbling one together yourself.
File system	The way a computer organises files in folders, like your Pictures/, School/, etc. directories on your own machine.
Dreaming	Anthropic's new background process that tidies an agent's memory between sessions, the way the brain "sorts" impressions while you sleep.
Research preview	An early test phase where developers have to request access before the feature is open to all.
Public beta	Open test phase: all developers can use the feature, but it may still change.
Multi-agent system	A system where several AI agents work at the same time, often on related tasks.
Permission scope	Rules for whether an agent can read, write, or only view a memory store.
Optimistic concurrency control	The traffic rule when many agents write to the same memory at once: each agent checks that no one else has changed the file before it saves.
Version history	A complete record of who changed what in memory, and when.
MCP (Model Context Protocol)	An open standard for how AI agents connect to external tools and data. Think of it as USB for AI.
Skills	Packages of instructions an agent loads when a task requires specific knowledge. Launched October 2025.
Claude Code	Anthropic's coding agent that runs in the terminal.
Claude Opus 4.7	Anthropic's latest large model, the best at file-system-based memory according to the company.