One protocol for
every screen.
A universal schema for AI agents to perceive and interact with any UI. One format for all platforms, built for LLM context windows.
npm install computeruseprotocolThe Problem
Every platform speaks a different language.
AI agents that interact with desktop UIs must handle completely different accessibility APIs on each operating system. Every framework reinvents this translation layer.
ControlType.ButtonAXButtonROLE_PUSH_BUTTONrole="button"How It Works
Capture. Normalize. Act.
Three steps from raw platform tree to actionable, token-efficient UI representation.
Capture
Grab the native accessibility tree from any platform: Windows UIA, macOS AXUIElement, Linux AT-SPI2, or Web CDP.
const tree = await snapshot() // Auto-detects OS, captures active window
Normalize
Map 280+ native roles down to 59 ARIA-derived roles. Prune decorative noise. Compress to compact format, 97% fewer tokens.
[e0] win "Spotify" 120,40 1680x1020 [e2] btn "Back" 132,52 32x32 [clk] [e7] nav "Main" 120,88 240x972
Act
Execute any of 15 canonical actions (click, type, scroll, press, and more) mapped back to native platform APIs. Integrate into your own AI agent or connect via MCP.
await session.action('e8', 'click') // click Home await session.action('e12', 'type', { value: 'jazz' }) await session.press('enter')
Benchmark
97% fewer tokens. 15x cheaper.
Same page, same information measured in LLM input tokens. CUP's compact format dramatically reduces context window cost.
| Format | Tokens | $/snapshot |
|---|---|---|
| CUP Compact | 6,244 | $0.019 |
| Playwright ARIA | 91,339 | $0.274 |
| Vercel agent-browser | 99,648 | $0.299 |
| CUP JSON (full) | 218,277 | $0.655 |
| Raw HTML DOM | 299,776 | $0.899 |
| Raw UIA Tree | 467,755 | $1.403 |
Measured on text-heavy articles. Pricing based on Claude Sonnet 4.6 @ $3/M input tokens
Schema
59 roles. 16 states. 15 actions.
A complete vocabulary for describing any user interface, derived from ARIA and mapped to native equivalents on every platform.
MCP Integration
Plug into any AI agent
in 3 lines.
CUP ships a built-in MCP server. Add it to Claude Code, Cursor, Copilot, or any MCP-compatible agent and start controlling desktop UIs immediately.
{
"mcpServers": {
"cup": {
"command": "cup-mcp"
}
}
}snapshotCapture active window treesnapshot_appCapture specific app by titleoverviewList all open windowssnapshot_desktopCapture desktop icons & widgetsfindSearch elements in last treeactionInteract with UI elementsopen_appOpen app by name (fuzzy match)screenshotCapture screen region as PNGSDKs
Same API. Two languages.
Mirror SDKs in Python and TypeScript with identical layered architecture. Auto-detects your OS and loads the right platform adapter.
npm install computeruseprotocolimport { snapshot, action } from 'computeruseprotocol' // Capture the active window's UI tree const tree = await snapshot() // Find and click a button await action('click', 'e14') // Type text into a search box await action('type', 'e9', { value: 'hello world' }) // Send a keyboard shortcut await action('press', { keys: 'ctrl+s' })
Use Cases
Built for builders.
AI Agents
Give LLMs structured UI perception at minimal token cost. Build agents that can navigate, read, and interact with any desktop application.
Test Automation
Write cross-platform UI tests with one API. Same test logic runs on Windows, macOS, and Linux without platform-specific selectors.
RPA & Workflow Automation
Automate repetitive tasks across any desktop application. 15 canonical actions cover every UI interaction pattern.
Accessibility Tooling
Build on a normalized accessibility layer. Analyze and audit UI trees across platforms with a consistent schema.