Why Generic AI Fails on Mainframes

Key insights

Generic language models answer confidently wrong on specialized systems. Wiegand's CICS example shows a plausible mainframe answer that was not actually right
What makes RAG powerful is that clients can load their own documentation: runbooks, incident post-mortems, and best-practice guides. That turns a generic AI into one that knows your shop specifically
Agents let the system take action, not just answer. They can open tickets, run health checks, and call out to hybrid-cloud services. Answer and action arrive together in a single prompt
The real business case is generational. Mainframes still run invisible critical infrastructure, but the people who know how to operate them are aging out

SourceYouTube

Published April 18, 2026

IBM Technology

Hosts:Daniel Wiegand

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Every time you buy a cup of coffee or swipe your card at a store, there is a very good chance the transaction is running on a mainframe, a large, specialized computer that banks, airlines, and retailers have quietly depended on for decades. Those machines still matter, but the people who know how to run them are aging out, and teams are being asked to manage more with fewer hands.

The obvious fix is to throw AI at the problem. That is what Daniel Wiegand, a product manager at IBM focused on mainframe operations, walks through in a six-minute IBM Technology video. The catch: a generic chatbot that works fine for planning a vacation will confidently give you the wrong answer about a real mainframe problem.

Wiegand's fix is two layers on top of a regular large language model: retrieval-augmented generation (RAG) to ground the model in real mainframe documentation, and agents that can take action on the system.

The invisible backbone

Wiegand opens with the everyday hook: the mainframe is "absolutely mission critical," and every coffee purchase or store transaction likely touches one at the back end.

That is easy to forget, because you never see a mainframe. It is a workhorse that handles enormous volumes of transactions without glamour, the kind of machine that has been running in the same data center since before smartphones existed. The industry has not replaced them because nothing else handles the same scale of reliable, concurrent transactions at the same cost.

Wiegand lists three real problems running them today:

Doing more with less. Teams are shrinking while workloads grow.
Treat the mainframe like anything else. Hybrid cloud setups mean the mainframe now has to cooperate with other infrastructure, not sit in its own silo.
Onboard the next generation. Most mainframe experts are senior. Whoever takes over needs to get productive fast.

Why generic AI gets it wrong

Wiegand's sharpest point is not really about mainframes. It is about how current chatbots fail on any specialized system.

He tells a story about CICS, the Customer Information Control System that IBM's mainframes use to process a huge share of the world's banking and point-of-sale traffic. He asked a general chatbot about a specific CICS error message. The bot answered confidently. The answer looked like a real mainframe answer. It was wrong.

This is the shape of hallucination when it hits specialized domains. The model has seen enough about CICS to assemble something that reads as authoritative, but not enough to know when it is bluffing. For a developer googling for fun, that is annoying. For an engineer debugging a payment system at 3 a.m., that is dangerous.

RAG: giving the model the right reading list

The fix Wiegand reaches for is retrieval-augmented generation, or RAG. The idea is deceptively simple:

	Without RAG	With RAG
Where the answer comes from	The model's training memory	Trusted documents retrieved at query time
What you add	Nothing. You take what the model knows	Your own best practices, runbooks, internal docs
Risk on a specialized topic	Plausible guess	Grounded answer, with a source behind it

Wiegand's framing is that RAG "helps ground the large language model in more relevant or more up-to-date information." In practice, that means the system looks up the right mainframe documentation before it writes anything, then builds its answer from those documents instead of from the model's raw training data.

Crucially, clients can ingest their own documentation too: best-practice guides, internal runbooks, incident post-mortems. That is the part that turns a generic mainframe assistant into one that knows your shop specifically.

The agent layer

With RAG in place, Wiegand layers in the second idea: agents. The difference is that a language model only answers. An agent can also do things.

He sketches the workload:

open a support ticket in a service desk
pull status from core monitors
run a health check across the environment
look for optimizations in how workloads are running
call out to hybrid-cloud services that are not even on the mainframe

The user types one prompt. The system uses RAG to answer from real documentation, and uses agents to pull live system state while it does. Answer and action arrive together.

Glossary

Term	Definition
Agent	Software that can take actions, not just answer questions. It can open tickets, query systems, call APIs, or run scripts on your behalf
CICS (Customer Information Control System)	IBM's transaction processor for mainframes. Runs banking transactions, point-of-sale, airline bookings at massive scale
Grounding	Giving a model access to specific, trusted documents so its answers tie back to real sources rather than training guesses
Hallucination	When a language model produces a confident-sounding answer that is not actually correct. More dangerous on specialized topics than on common ones
Large language model (LLM)	The kind of AI behind ChatGPT, Claude, and Gemini. Trained on huge amounts of text, but only knows what was in the training data
Mainframe	A large, specialized computer that runs critical business transactions at huge scale. Banks, airlines, and retailers still rely on them
Retrieval-augmented generation (RAG)	Pattern where the model looks up relevant documents first, then writes its answer from them, instead of relying purely on memory

Sources and resources

IBM Technology — How AI, RAG, and Agents Transform Mainframe Operations — The source video
Daniel Wiegand on Planet Mainframe — His articles on IBM Z operations and AI ops
IBM Z — The mainframe product line
IBM watsonx Assistant for Z — The unnamed product behind the video
Retrieval-augmented generation — Wikipedia — The grounding pattern the video describes
CICS — Wikipedia — Background on the transaction processor Wiegand uses as an example