Skip to content
Back to articles

Playwright CLI Turns Natural Language into Browser Tests

March 7, 2026·5 min read·966 words
AITestingVideo Summary
Playwright CLI generating browser tests from natural language in Goose
Image: Screenshot from YouTube.

Key insights

  • The Playwright CLI saves page snapshots locally instead of loading them into the AI context, cutting token usage compared to Playwright MCP
  • An AI agent can navigate a website, interact with elements, and auto-generate a passing Playwright test without writing code
  • Video recordings and trace files let you verify exactly what the agent did, step by step
SourceYouTube
Published February 13, 2026
goose OSS
goose OSS
Hosts:Debbie O'Brien

This article is a summary of How to Use Playwright CLI Skill for Agentic Testing. Watch the video

Read this article in norsk


In Brief

Debbie O'Brien from Block shows how the Playwright CLI skill turns Goose, an open-source AI agent, into a browser testing tool. Instead of writing test code by hand, you describe what you want in plain English. The agent browses the site, clicks elements, and generates a working Playwright test that passes on the first run.

0
lines of test code written manually
1
command to install all skills
13 min
from setup to passing test + trace

What you will learn

  • How to install and activate the Playwright CLI skill in Goose
  • How to browse websites and generate test code using natural language
  • How to record video and trace files for debugging agent actions

Setup and first steps

1

Install the Playwright CLI globally

Run the Playwright CLI install command to make it available system-wide. Then install the skills into your project by running playwright cli install-skills (0:35). This creates a folder with a skill file and reference documents that teach the agent what commands are available.

2

Activate the skill in Goose

In your Goose agent, search for the Playwright CLI extension and make sure it is activated (2:00). O'Brien uses the Goose coding agent, but any agent that supports skills will work.

3

Open a browser and take a screenshot

Tell the agent to open a website. By default, the browser runs in headless mode (no visible window), since the agent does not need to see it. Add --headed if you want to watch what happens (4:13). The agent saves a page snapshot locally as a YAML file containing the page's accessibility tree (a structured map of every element on the page). This is the key difference from the Playwright MCP server: the snapshot stays on disk, not in the AI's context window, which saves tokens (3:42).

Analogy:

Explained simply: Think of the snapshot like a table of contents for a book. Instead of reading every word (loading the full page into the AI's memory), the agent checks the table of contents to find what it needs. The limitation: unlike a real table of contents, the snapshot can become stale if the page changes between actions.


Generating tests from natural language

1

Describe the user flow in plain English

O'Brien types: "Go to the videos page and filter for MCP, then create a test from your interactions" (4:38). The agent navigates to the page, finds the MCP tag filter, clicks it, and observes the result. It uses the ref numbers from the snapshot to locate elements.

2

Review the generated test code

The agent creates a Playwright TypeScript test file with page.goto, assertions like expect(page).toHaveTitle(), and click actions using getByRole (6:03). The generated code follows Playwright best practices with role-based locators (finding elements by their function, like "button" or "link," rather than brittle CSS selectors).

3

Run the test

Ask the agent to run the test. It executes npx playwright test and reports the result. The test passes on the first run (8:06).

Analogy:

Explained simply: Imagine dictating a recipe to a chef who writes down every step and measurement as you cook. Afterward, anyone can follow that written recipe to reproduce the same dish. The catch: the agent records what it observes, so if it misses a subtle step, the "recipe" may be incomplete.


Recording video and traces

Beyond test generation, the Playwright CLI can record what the agent does for later review (9:20).

Video recording saves an MP4 of the browser session. O'Brien suggests attaching these to pull requests (code review submissions) so reviewers can see exactly what the agent tested.

Trace recording creates a detailed log that opens in Playwright's Trace Viewer (12:39). The trace shows a timeline of every action, the refs used, network requests, console output, and page screenshots at each step. You can even pick locators directly from the trace.


Troubleshooting

  • Trace file won't open in Trace Viewer? The viewer expects a .zip file. If the trace was saved as an uncompressed folder, zip it first (12:18).
  • Browser not visible? By default, the CLI runs headless. Use --headed to see the browser window. Only needed for human observation.
  • Skill not found? Make sure the Playwright CLI extension is activated in your agent's settings, not just installed.

Test yourself

  1. Transfer: How could you use this approach to test a multi-step checkout flow on an e-commerce site? What instructions would you give the agent?
  2. Trade-off: When would you choose the Playwright CLI skill over the Playwright MCP server? What situations favor each approach?
  3. Architecture: How would you set up a CI pipeline that uses an AI agent to generate and run browser tests automatically on each pull request?

Glossary

TermDefinition
PlaywrightA browser testing framework by Microsoft that automates clicks, form fills, and navigation in Chrome, Firefox, and Safari.
CLI (Command-Line Interface)A text-based way to interact with software by typing commands instead of clicking buttons.
GooseAn open-source AI agent by Block (formerly Square) that runs on your machine and can use extensions called skills.
SkillAn extension that teaches a Goose agent new capabilities, like browsing the web or generating tests.
Headless browserA browser that runs without a visible window. Faster and uses fewer resources, but you cannot see what happens.
Accessibility treeA structured map of every element on a web page. The same data screen readers use, and what the Playwright CLI uses to find elements.
SnapshotA saved copy of the page's accessibility tree at a specific moment. Stored locally to save token costs.
TraceA detailed recording of every browser action, including screenshots, network requests, and console logs.
End-to-end test (E2E)A test that checks the full user journey through an application, from start to finish.
TokenA small unit of text that AI models process. Using fewer tokens means lower cost and faster responses.
MCP serverA protocol (Model Context Protocol) for connecting AI agents to external tools. The Playwright MCP server loads page data into the AI's context, using more tokens.

Sources and resources