CUPComputer Use Protocol

What is CUP?

Complete reference for the Computer Use Protocol — a universal schema for AI agents to perceive and interact with any desktop UI.

What is CUP?

Computer Use Protocol (CUP) is a universal schema for representing UI accessibility trees. One format that works identically across Windows, macOS, Linux, Web, Android, and iOS.

It includes a compact text encoding optimized for LLM context windows — achieving ~97% fewer tokens than raw JSON — making it ideal for AI agents that need to perceive and act on desktop UIs.

Why CUP?

  • One format everywhere — write agent logic once, run on any platform
  • Built for LLMs — compact encoding fits complex UIs into context windows at ~15x fewer tokens
  • Built for actions — 15 canonical verbs mapped to native platform APIs
  • No information loss — raw native properties preserved via node.platform.*

Architecture

CUP follows a 5-layer architecture:

  1. Platform Adapters — OS-specific tree capture (UIA, AXUIElement, AT-SPI2, CDP)
  2. Action Executors — 15 canonical verbs mapped to native APIs
  3. Format Pipeline — Normalize, prune, serialize to compact text
  4. Semantic Search — Fuzzy element matching by role, name, state, or action
  5. Session APIsnapshot(), action(), overview()

Above it all sits the MCP layer, exposing 9 tools to any AI agent.

On this page