v0.1.0 - Open Source

One protocol for
every screen.

A universal schema for AI agents to perceive and interact with any UI. One format for all platforms, built for LLM context windows.

$npm install computeruseprotocol
Python & TypeScript
MCP-native
97% token reduction
CUP Demo
Acme App — New Contact
Select...
Engineering
Design
Marketing
Cancel
Submit

The Problem

Every platform speaks a different language.

AI agents that interact with desktop UIs must handle completely different accessibility APIs on each operating system. Every framework reinvents this translation layer.

Windows
UIA
ControlType.Button
~40 types
macOS
AXUIElement
AXButton
~60 roles
Linux
AT-SPI2
ROLE_PUSH_BUTTON
~100 roles
Web
ARIA
role="button"
~80 roles
CUP
59 ARIA-derived roles. One format.
15 actions · 16 states · All platforms

How It Works

Capture. Normalize. Act.

Three steps from raw platform tree to actionable, token-efficient UI representation.

01

Capture

Grab the native accessibility tree from any platform: Windows UIA, macOS AXUIElement, Linux AT-SPI2, or Web CDP.

const tree = await snapshot()
// Auto-detects OS, captures active window
02

Normalize

Map 280+ native roles down to 59 ARIA-derived roles. Prune decorative noise. Compress to compact format, 97% fewer tokens.

[e0] win "Spotify" 120,40 1680x1020
  [e2] btn "Back" 132,52 32x32 [clk]
  [e7] nav "Main" 120,88 240x972
03

Act

Execute any of 15 canonical actions (click, type, scroll, press, and more) mapped back to native platform APIs. Integrate into your own AI agent or connect via MCP.

await session.action('e8', 'click')    // click Home
await session.action('e12', 'type', { value: 'jazz' })
await session.press('enter')

Benchmark

97% fewer tokens. 15x cheaper.

Same page, same information measured in LLM input tokens. CUP's compact format dramatically reduces context window cost.

97%
fewer tokens vs JSON
15x
cheaper than Playwright
75x
smaller than raw UIA
FormatTokens$/snapshot
CUP Compact6,244$0.019
Playwright ARIA91,339$0.274
Vercel agent-browser99,648$0.299
CUP JSON (full)218,277$0.655
Raw HTML DOM299,776$0.899
Raw UIA Tree467,755$1.403

Measured on text-heavy articles. Pricing based on Claude Sonnet 4.6 @ $3/M input tokens

Schema

59 roles. 16 states. 15 actions.

A complete vocabulary for describing any user interface, derived from ARIA and mapped to native equivalents on every platform.

CUP JSON218,277 tokens
{
"version": "0.1.0",
"platform": "windows",
"screen": { "w": 1920, "h": 1080 },
"tree": [
{
"id": "e0",
"role": "window",
"name": "Spotify",
"bounds": { "x": 0, "y": 0, "w": 1920, "h": 1080 },
"states": ["focused"],
"actions": ["focus"],
"children": [
{
"id": "e1",
"role": "navigation",
"name": "Main",
"bounds": { "x": 0, "y": 0, "w": 280, "h": 1030 },
"states": [],
"actions": [],
"children": [
{
"id": "e2",
"role": "link",
"name": "Home",
"bounds": { "x": 16, "y": 12, "w": 248, "h": 40 },
"states": ["selected"],
"actions": ["click"]
},
{
"id": "e3",
"role": "searchbox",
"name": "Search",
"bounds": { "x": 16, "y": 56, "w": 248, "h": 40 },
"states": [],
"actions": ["click", "type"],
"attributes": { "placeholder": "What do you want to listen to?" }
}
]
},
{
"id": "e5",
"role": "list",
"name": "Recently Played",
"bounds": { "x": 280, "y": 64, "w": 1640, "h": 280 },
"states": [],
"actions": ["scroll"],
"children": [
{
"id": "e6",
"role": "listitem",
"name": "Liked Songs — 2,847 songs",
"bounds": { "x": 296, "y": 80, "w": 192, "h": 248 },
"states": [],
"actions": ["click"]
},
{
"id": "e7",
"role": "listitem",
"name": "Discover Weekly",
"bounds": { "x": 504, "y": 80, "w": 192, "h": 248 },
"states": [],
"actions": ["click"]
}
]
},
{
"id": "e12",
"role": "toolbar",
"name": "Now Playing",
"bounds": { "x": 0, "y": 1030, "w": 1920, "h": 50 },
"states": [],
"actions": [],
"children": [
{
"id": "e13",
"role": "text",
"name": "Bohemian Rhapsody",
"bounds": { "x": 72, "y": 1035, "w": 200, "h": 20 },
"states": [],
"actions": []
},
{
"id": "e14",
"role": "text",
"name": "Queen",
"bounds": { "x": 72, "y": 1055, "w": 60, "h": 16 },
"states": [],
"actions": []
},
{
"id": "e16",
"role": "button",
"name": "Previous",
"bounds": { "x": 870, "y": 1038, "w": 32, "h": 32 },
"states": [],
"actions": ["click"]
},
{
"id": "e17",
"role": "button",
"name": "Pause",
"bounds": { "x": 914, "y": 1034, "w": 40, "h": 40 },
"states": ["pressed"],
"actions": ["click", "toggle"]
},
{
"id": "e18",
"role": "button",
"name": "Next",
"bounds": { "x": 966, "y": 1038, "w": 32, "h": 32 },
"states": [],
"actions": ["click"]
},
{
"id": "e20",
"role": "slider",
"name": "Song progress",
"value": "142",
"bounds": { "x": 720, "y": 1072, "w": 480, "h": 4 },
"states": [],
"actions": ["increment", "decrement", "setvalue"],
"attributes": { "valueMin": 0, "valueMax": 354, "valueNow": 142 }
},
{
"id": "e22",
"role": "slider",
"name": "Volume",
"value": "72",
"bounds": { "x": 1780, "y": 1048, "w": 100, "h": 4 },
"states": [],
"actions": ["increment", "decrement", "setvalue"],
"attributes": { "valueMin": 0, "valueMax": 100, "valueNow": 72 }
}
]
}
]
}
]
}
CUP Compact6,244 tokens
# CUP 0.1.0 | windows | 1920x1080
# app: Spotify
# 14 nodes (187 before pruning)
 
[e0] win "Spotify" {foc}
[e1] nav "Main"
[e2] lnk "Home" 16,12 248x40 {sel} [clk]
[e3] sbx "Search" 16,56 248x40 [clk,typ] (ph="What do you want to listen to?")
[e5] lst "Recently Played" [scr]
[e6] li "Liked Songs — 2,847 songs" 296,80 192x248 [clk]
[e7] li "Discover Weekly" 504,80 192x248 [clk]
[e12] tlbr "Now Playing"
[e13] txt "Bohemian Rhapsody"
[e14] txt "Queen"
[e16] btn "Previous" 870,1038 32x32 [clk]
[e17] btn "Pause" 914,1034 40x40 {prs} [clk,tog]
[e18] btn "Next" 966,1038 32x32 [clk]
[e20] sld "Song progress" 720,1072 480x4 [inc,dec,sv] val="142" (range=0..354)
[e22] sld "Volume" 1780,1048 100x4 [inc,dec,sv] val="72" (range=0..100)

MCP Integration

Plug into any AI agent
in 3 lines.

CUP ships a built-in MCP server. Add it to Claude Code, Cursor, Copilot, or any MCP-compatible agent and start controlling desktop UIs immediately.

.mcp.json
{
  "mcpServers": {
    "cup": {
      "command": "cup-mcp"
    }
  }
}
Works with:Claude CodeCursorOpenClawCodex
MCP Tools
snapshotCapture active window tree
snapshot_appCapture specific app by title
overviewList all open windows
snapshot_desktopCapture desktop icons & widgets
findSearch elements in last tree
actionInteract with UI elements
open_appOpen app by name (fuzzy match)
screenshotCapture screen region as PNG

SDKs

Same API. Two languages.

Mirror SDKs in Python and TypeScript with identical layered architecture. Auto-detects your OS and loads the right platform adapter.

$npm install computeruseprotocol
import { snapshot, action } from 'computeruseprotocol'
 
// Capture the active window's UI tree
const tree = await snapshot()
 
// Find and click a button
await action('click', 'e14')
 
// Type text into a search box
await action('type', 'e9', { value: 'hello world' })
 
// Send a keyboard shortcut
await action('press', { keys: 'ctrl+s' })
Platform Adapters
OS auto-detection
Action Executors
15 canonical verbs
Format Pipeline
Tree normalization
Session API
Public surface

Use Cases

Built for builders.

AI Agents

Give LLMs structured UI perception at minimal token cost. Build agents that can navigate, read, and interact with any desktop application.

Test Automation

Write cross-platform UI tests with one API. Same test logic runs on Windows, macOS, and Linux without platform-specific selectors.

RPA & Workflow Automation

Automate repetitive tasks across any desktop application. 15 canonical actions cover every UI interaction pattern.

Accessibility Tooling

Build on a normalized accessibility layer. Analyze and audit UI trees across platforms with a consistent schema.