What is CUP?
Complete reference for the Computer Use Protocol — a universal schema for AI agents to perceive and interact with any desktop UI.
What is CUP?
Computer Use Protocol (CUP) is a universal schema for representing UI accessibility trees. One format that works identically across Windows, macOS, Linux, Web, Android, and iOS.
It includes a compact text encoding optimized for LLM context windows — achieving ~97% fewer tokens than raw JSON — making it ideal for AI agents that need to perceive and act on desktop UIs.
Why CUP?
- One format everywhere — write agent logic once, run on any platform
- Built for LLMs — compact encoding fits complex UIs into context windows at ~15x fewer tokens
- Built for actions — 15 canonical verbs mapped to native platform APIs
- No information loss — raw native properties preserved via
node.platform.*
Quick links
Installation
Install the Python or TypeScript SDK
Quick Start
Capture your first UI tree in 30 seconds
Core Concepts
Roles, states, actions, and the compact format
MCP Integration
Connect to Claude Code, Cursor, and more
Architecture
CUP follows a 5-layer architecture:
- Platform Adapters — OS-specific tree capture (UIA, AXUIElement, AT-SPI2, CDP)
- Action Executors — 15 canonical verbs mapped to native APIs
- Format Pipeline — Normalize, prune, serialize to compact text
- Semantic Search — Fuzzy element matching by role, name, state, or action
- Session API —
snapshot(),action(),overview()
Above it all sits the MCP layer, exposing 9 tools to any AI agent.