Why Generic AI Fails on Mainframes

Key insights
- Generic language models answer confidently wrong on specialized systems. Wiegand's CICS example shows a plausible mainframe answer that was not actually right
- What makes RAG powerful is that clients can load their own documentation: runbooks, incident post-mortems, and best-practice guides. That turns a generic AI into one that knows your shop specifically
- Agents let the system take action, not just answer. They can open tickets, run health checks, and call out to hybrid-cloud services. Answer and action arrive together in a single prompt
- The real business case is generational. Mainframes still run invisible critical infrastructure, but the people who know how to operate them are aging out
This is an AI-generated summary. The source video may include demos, visuals and additional context.
In Brief
Every time you buy a cup of coffee or swipe your card at a store, there is a very good chance the transaction is running on a mainframe, a large, specialized computer that banks, airlines, and retailers have quietly depended on for decades. Those machines still matter, but the people who know how to run them are aging out, and teams are being asked to manage more with fewer hands.
The obvious fix is to throw AI at the problem. That is what Daniel Wiegand, a product manager at IBM focused on mainframe operations, walks through in a six-minute IBM Technology video. The catch: a generic chatbot that works fine for planning a vacation will confidently give you the wrong answer about a real mainframe problem.
Wiegand's fix is two layers on top of a regular large language model: retrieval-augmented generation (RAG) to ground the model in real mainframe documentation, and agents that can take action on the system.
Related reading:
The invisible backbone
Wiegand opens with the everyday hook: the mainframe is "absolutely mission critical," and every coffee purchase or store transaction likely touches one at the back end.
That is easy to forget, because you never see a mainframe. It is a workhorse that handles enormous volumes of transactions without glamour, the kind of machine that has been running in the same data center since before smartphones existed. The industry has not replaced them because nothing else handles the same scale of reliable, concurrent transactions at the same cost.
Wiegand lists three real problems running them today:
- Doing more with less. Teams are shrinking while workloads grow.
- Treat the mainframe like anything else. Hybrid cloud setups mean the mainframe now has to cooperate with other infrastructure, not sit in its own silo.
- Onboard the next generation. Most mainframe experts are senior. Whoever takes over needs to get productive fast.
Why generic AI gets it wrong
Wiegand's sharpest point is not really about mainframes. It is about how current chatbots fail on any specialized system.
He tells a story about CICS, the Customer Information Control System that IBM's mainframes use to process a huge share of the world's banking and point-of-sale traffic. He asked a general chatbot about a specific CICS error message. The bot answered confidently. The answer looked like a real mainframe answer. It was wrong.
This is the shape of hallucination when it hits specialized domains. The model has seen enough about CICS to assemble something that reads as authoritative, but not enough to know when it is bluffing. For a developer googling for fun, that is annoying. For an engineer debugging a payment system at 3 a.m., that is dangerous.
RAG: giving the model the right reading list
The fix Wiegand reaches for is retrieval-augmented generation, or RAG. The idea is deceptively simple:
| Without RAG | With RAG | |
|---|---|---|
| Where the answer comes from | The model's training memory | Trusted documents retrieved at query time |
| What you add | Nothing. You take what the model knows | Your own best practices, runbooks, internal docs |
| Risk on a specialized topic | Plausible guess | Grounded answer, with a source behind it |
Wiegand's framing is that RAG "helps ground the large language model in more relevant or more up-to-date information." In practice, that means the system looks up the right mainframe documentation before it writes anything, then builds its answer from those documents instead of from the model's raw training data.
Crucially, clients can ingest their own documentation too: best-practice guides, internal runbooks, incident post-mortems. That is the part that turns a generic mainframe assistant into one that knows your shop specifically.
The agent layer
With RAG in place, Wiegand layers in the second idea: agents. The difference is that a language model only answers. An agent can also do things.
He sketches the workload:
- open a support ticket in a service desk
- pull status from core monitors
- run a health check across the environment
- look for optimizations in how workloads are running
- call out to hybrid-cloud services that are not even on the mainframe
The user types one prompt. The system uses RAG to answer from real documentation, and uses agents to pull live system state while it does. Answer and action arrive together.
Glossary
| Term | Definition |
|---|---|
| Agent | Software that can take actions, not just answer questions. It can open tickets, query systems, call APIs, or run scripts on your behalf |
| CICS (Customer Information Control System) | IBM's transaction processor for mainframes. Runs banking transactions, point-of-sale, airline bookings at massive scale |
| Grounding | Giving a model access to specific, trusted documents so its answers tie back to real sources rather than training guesses |
| Hallucination | When a language model produces a confident-sounding answer that is not actually correct. More dangerous on specialized topics than on common ones |
| Large language model (LLM) | The kind of AI behind ChatGPT, Claude, and Gemini. Trained on huge amounts of text, but only knows what was in the training data |
| Mainframe | A large, specialized computer that runs critical business transactions at huge scale. Banks, airlines, and retailers still rely on them |
| Retrieval-augmented generation (RAG) | Pattern where the model looks up relevant documents first, then writes its answer from them, instead of relying purely on memory |
Sources and resources
- IBM Technology — How AI, RAG, and Agents Transform Mainframe Operations — The source video
- Daniel Wiegand on Planet Mainframe — His articles on IBM Z operations and AI ops
- IBM Z — The mainframe product line
- IBM watsonx Assistant for Z — The unnamed product behind the video
- Retrieval-augmented generation — Wikipedia — The grounding pattern the video describes
- CICS — Wikipedia — Background on the transaction processor Wiegand uses as an example
Want to go deeper? Watch the full video on YouTube →