Skip to content
Back to articles

Why Generic AI Fails on Mainframes

April 18, 2026/4 min read/730 words
IBMAI AgentsGenerative AIAI Infrastructure
Daniel Wiegand of IBM explaining AI, RAG, and agents for mainframe operations
Image: Screenshot from YouTube.

Key insights

  • Generic language models answer confidently wrong on specialized systems. Wiegand's CICS example shows a plausible mainframe answer that was not actually right
  • What makes RAG powerful is that clients can load their own documentation: runbooks, incident post-mortems, and best-practice guides. That turns a generic AI into one that knows your shop specifically
  • Agents let the system take action, not just answer. They can open tickets, run health checks, and call out to hybrid-cloud services. Answer and action arrive together in a single prompt
  • The real business case is generational. Mainframes still run invisible critical infrastructure, but the people who know how to operate them are aging out
SourceYouTube
Published April 18, 2026
IBM Technology
IBM Technology
Hosts:Daniel Wiegand

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Every time you buy a cup of coffee or swipe your card at a store, there is a very good chance the transaction is running on a mainframe, a large, specialized computer that banks, airlines, and retailers have quietly depended on for decades. Those machines still matter, but the people who know how to run them are aging out, and teams are being asked to manage more with fewer hands.

The obvious fix is to throw AI at the problem. That is what Daniel Wiegand, a product manager at IBM focused on mainframe operations, walks through in a six-minute IBM Technology video. The catch: a generic chatbot that works fine for planning a vacation will confidently give you the wrong answer about a real mainframe problem.

Wiegand's fix is two layers on top of a regular large language model: retrieval-augmented generation (RAG) to ground the model in real mainframe documentation, and agents that can take action on the system.

The invisible backbone

Wiegand opens with the everyday hook: the mainframe is "absolutely mission critical," and every coffee purchase or store transaction likely touches one at the back end.

That is easy to forget, because you never see a mainframe. It is a workhorse that handles enormous volumes of transactions without glamour, the kind of machine that has been running in the same data center since before smartphones existed. The industry has not replaced them because nothing else handles the same scale of reliable, concurrent transactions at the same cost.

Wiegand lists three real problems running them today:

  1. Doing more with less. Teams are shrinking while workloads grow.
  2. Treat the mainframe like anything else. Hybrid cloud setups mean the mainframe now has to cooperate with other infrastructure, not sit in its own silo.
  3. Onboard the next generation. Most mainframe experts are senior. Whoever takes over needs to get productive fast.

Why generic AI gets it wrong

Wiegand's sharpest point is not really about mainframes. It is about how current chatbots fail on any specialized system.

He tells a story about CICS, the Customer Information Control System that IBM's mainframes use to process a huge share of the world's banking and point-of-sale traffic. He asked a general chatbot about a specific CICS error message. The bot answered confidently. The answer looked like a real mainframe answer. It was wrong.

This is the shape of hallucination when it hits specialized domains. The model has seen enough about CICS to assemble something that reads as authoritative, but not enough to know when it is bluffing. For a developer googling for fun, that is annoying. For an engineer debugging a payment system at 3 a.m., that is dangerous.

RAG: giving the model the right reading list

The fix Wiegand reaches for is retrieval-augmented generation, or RAG. The idea is deceptively simple:

Without RAGWith RAG
Where the answer comes fromThe model's training memoryTrusted documents retrieved at query time
What you addNothing. You take what the model knowsYour own best practices, runbooks, internal docs
Risk on a specialized topicPlausible guessGrounded answer, with a source behind it

Wiegand's framing is that RAG "helps ground the large language model in more relevant or more up-to-date information." In practice, that means the system looks up the right mainframe documentation before it writes anything, then builds its answer from those documents instead of from the model's raw training data.

Crucially, clients can ingest their own documentation too: best-practice guides, internal runbooks, incident post-mortems. That is the part that turns a generic mainframe assistant into one that knows your shop specifically.

The agent layer

With RAG in place, Wiegand layers in the second idea: agents. The difference is that a language model only answers. An agent can also do things.

He sketches the workload:

  • open a support ticket in a service desk
  • pull status from core monitors
  • run a health check across the environment
  • look for optimizations in how workloads are running
  • call out to hybrid-cloud services that are not even on the mainframe

The user types one prompt. The system uses RAG to answer from real documentation, and uses agents to pull live system state while it does. Answer and action arrive together.

Glossary

TermDefinition
AgentSoftware that can take actions, not just answer questions. It can open tickets, query systems, call APIs, or run scripts on your behalf
CICS (Customer Information Control System)IBM's transaction processor for mainframes. Runs banking transactions, point-of-sale, airline bookings at massive scale
GroundingGiving a model access to specific, trusted documents so its answers tie back to real sources rather than training guesses
HallucinationWhen a language model produces a confident-sounding answer that is not actually correct. More dangerous on specialized topics than on common ones
Large language model (LLM)The kind of AI behind ChatGPT, Claude, and Gemini. Trained on huge amounts of text, but only knows what was in the training data
MainframeA large, specialized computer that runs critical business transactions at huge scale. Banks, airlines, and retailers still rely on them
Retrieval-augmented generation (RAG)Pattern where the model looks up relevant documents first, then writes its answer from them, instead of relying purely on memory

Sources and resources

Share this article