AWS VP: Three Milestones AI Agents Must Hit First

Key insights

Automated reasoning offers a fundamentally different approach to agent trust than guardrails or RLHF, using mathematical proof to verify actions before they execute.
AWS's own coding agent hallucinated API calls in its prototype, suggesting unreliability is a structural challenge that even the largest cloud providers cannot skip.
Smarter models alone will not democratize agent building. Non-coders need familiar interfaces and simulated environments to test agents safely.

SourceYouTube

Published December 18, 2025

TED

Hosts:Swami Sivasubramanian

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Swami Sivasubramanian, VP of Agentic AI at AWS, argues in a TED talk recorded at TEDAI Vienna that AI agents are the most transformative technology shift of our time. But three milestones must be reached first: agents must reshape how software is built, earn trust through mathematical verification, and become accessible to non-coders. The talk blends personal anecdote with technical specifics, including an honest admission that AWS's own coding agent hallucinated API calls (fabricated requests to application programming interfaces that did not exist) during its prototype phase.

The central claim

Sivasubramanian frames his argument around a personal story. Growing up in rural India, he had 10 minutes of computer access per week at a shared school computer. That limited access forced him to become, in his words, a "human compiler" who debugged code in his head before sitting down at the keyboard. Twenty years later at Amazon, he now leads the agentic AI division at AWS.

His central argument is that AI agents are fundamentally different from chatbots. A chatbot responds to your prompt. An agent takes a goal, breaks it into steps, uses tools, and acts on your behalf. He illustrates the distinction with a researcher example: a chatbot would suggest six experiments to run, while an agent would plan, code, and run those experiments itself, synthesize the results, and learn from failures.

But Sivasubramanian is careful not to oversell. "We are not there yet", he states plainly, before laying out the three milestones agents must reach.

Milestone 1: Change how software gets built

The first milestone is about shifting developers from "how" to "what." Today, building a web application means choosing from a huge menu of infrastructure options. Sivasubramanian points out that AWS's EC2 service alone offers roughly 850 compute options, and that is just one service among many.

In an agentic future, he argues, those implementation details become irrelevant. Developers describe what they want to build, and agents handle the infrastructure decisions. This is not a small change. It means the skill that matters most shifts from technical knowledge of cloud services to the ability to clearly describe a goal.

Milestone 2: Trust through automated reasoning

The second milestone is the one Sivasubramanian spends the most time on, and for good reason. Without trust, agent capabilities do not matter.

He illustrates the problem with a candid story about Amazon Q, AWS's coding assistant. The first prototype was, in his words, "eager and error-prone". It was hallucinating API calls, generating requests to endpoints that did not exist. This is a striking admission. Even the world's largest cloud provider could not skip the unreliability phase.

The solution AWS adopted is automated reasoning, a field of computer science that uses mathematical logic to verify whether a system behaves correctly. Rather than relying on pattern-matching or reinforcement learning to reduce errors, automated reasoning provides formal mathematical proof that an action is valid before it executes.

Here is how it works in practice: every time Amazon Q generates an API request, an automated reasoning solver checks whether the request matches the formalized API specification. If the solver finds an error, it sends feedback to the agent explaining what went wrong. The agent then restructures its code. This back-and-forth creates what Sivasubramanian calls a "neurosymbolic feedback loop," combining neural AI (the language model) with symbolic logic (the mathematical solver).

The speed claim is notable: verification happens in 100 microseconds or less for 95% of use cases. That is fast enough to run on every single agent action without creating noticeable delays.

Milestone 3: Democratize who can build agents

The third milestone is about reach. If only software developers can build agents, agents will remain a tool for the few. Sivasubramanian argues that the real transformation happens when business users, cinematographers, researchers, and others can build agents without writing code.

He uses an example from Amazon Prime Video. Creating a recap of a TV series traditionally takes weeks of manual work: defining a story arc, selecting scenes, writing narration. The Prime Video team broke this workflow into three agent phases. First, observation: understanding what happens in each scene. Then reasoning: deciding which scenes matter. Finally, action: assembling the final product with human experts reviewing the output.

The broader point is that smarter models alone are not enough. Non-coders need interfaces that feel familiar, and agents need simulated environments, what Sivasubramanian calls "worlds to play in," to learn and improve safely. He explicitly names digital twins, virtual replicas of real-world systems, as a key enabler.

Opposing perspectives

The trust gap may be wider than automated reasoning can bridge

Sivasubramanian presents automated reasoning as a solution to the trust problem, but his examples are limited to API call validation, a domain with well-defined specifications. Many real-world agent tasks involve ambiguous goals, incomplete information, and judgment calls that cannot be formalized into mathematical specifications. Validating an API call is very different from validating whether an agent made the right strategic decision.

Democratization assumes the problem is tooling

The talk frames agent building as a tooling problem: make the interfaces simpler, and everyone can build agents. But there is a deeper challenge. Building useful agents requires understanding what agents are good at, where they fail, and how to design workflows around their limitations. Simpler tools lower the technical barrier, but they do not automatically create the conceptual understanding needed to build agents that actually work.

How to interpret these claims

Sivasubramanian's talk is polished and well-structured, but several factors deserve consideration.

Platform interest

As VP of Agentic AI at AWS, Sivasubramanian has a direct commercial interest in the adoption of agentic AI. The milestones he describes align neatly with AWS's product roadmap, including Amazon Q and Amazon Bedrock. This does not make his claims wrong, but it means the framing naturally emphasizes problems AWS is positioned to solve.

Scope of the evidence

The two concrete examples in the talk, Amazon Q's API validation and Prime Video's recap tool, are both internal Amazon projects. Independent, third-party validation of these results is not presented. The 100-microsecond verification claim is impressive, but it applies to a narrow domain (structured API calls) and may not generalize to less structured agent tasks.

The optimism-reality gap

The talk closes with a sweeping vision: agents will give rise to more companies, drive medical breakthroughs, and spark discoveries. These are aspirational claims without specific evidence. The gap between "we fixed API hallucinations in our coding tool" and "agents will change everything" is significant. Listeners should weigh the concrete, verified examples more heavily than the broader predictions.

Practical implications

For developers building with agents

Automated reasoning as a verification layer is a concrete, implementable idea. If your agents interact with well-specified systems (APIs, databases, cloud services), formalizing those specifications and adding a verification step could catch errors before they reach users.

For organizations evaluating agent adoption

The "eager intern" analogy is useful framing for setting expectations. Agents in their current state are capable but unreliable. Planning for human review loops and structured verification is more realistic than expecting agents to work autonomously from day one.

Glossary

Term	Definition
AI agent	Autonomous software that reasons, plans, and acts on behalf of a user, unlike a chatbot that only responds to prompts.
Chatbot	A conversational AI that answers questions or follows instructions but does not take independent action.
Automated reasoning	A computer science field that uses mathematical logic to prove whether a system behaves as expected.
Neurosymbolic feedback loop	A system where AI-generated outputs are checked by a mathematical solver before execution, combining neural networks with symbolic logic.
Hallucination	When an AI generates plausible-sounding but incorrect information, such as fabricated API calls or nonexistent facts.
Digital twin	A virtual replica of a real-world system used for testing, simulation, and training.
Agentic AI	AI systems designed to take autonomous actions rather than just generate text responses.
EC2	Amazon Elastic Compute Cloud, a service that provides scalable computing capacity in the AWS cloud.