UI Trees
How CUP represents screen contents as a tree of accessible elements.
What is a UI tree?
Every desktop application exposes a hierarchy of UI elements through its operating system's accessibility API. A window contains panels, panels contain buttons and text fields, text fields contain text. This hierarchy is the accessibility tree.
CUP captures this tree, normalizes it into a universal schema, and serializes it into a compact text format optimized for LLM consumption.
Tree structure
A CUP tree is an array of nodes. Each node represents a UI element:
{
"id": "e14",
"role": "button",
"name": "Submit",
"bounds": { "x": 120, "y": 340, "w": 88, "h": 36 },
"states": ["focused"],
"actions": ["click"],
"children": []
}In compact format, the same node becomes:
[e14] btn "Submit" 120,340 88x36 {foc} [clk]Node properties
| Property | Description | Example |
|---|---|---|
id | Unique identifier (ephemeral per snapshot) | e14 |
role | ARIA-derived semantic role | button, textbox |
name | Accessible name (max 200 chars) | "Submit" |
bounds | Screen position and size | { x: 120, y: 340, w: 88, h: 36 } |
states | Active state flags | ["focused", "expanded"] |
actions | Available interactions | ["click", "type"] |
children | Nested child nodes | [...] |
value | Current value (for inputs/sliders) | "hello" |
platform | Native platform properties | { windows: { ... } } |
Element IDs
Element IDs follow the pattern e0, e1, e2, etc. They are ephemeral — valid only for the snapshot that generated them. After performing any action, you must re-capture to get fresh IDs.
This design keeps the protocol stateless: every snapshot is a complete, self-contained representation of the UI.
Tree depth
Trees can be deep for complex UIs. CUP's pruning pipeline removes noise (decorative elements, zero-size nodes, redundant containers) to keep the tree focused on interactive and meaningful elements.
A typical Spotify window goes from 280 raw nodes to 63 nodes after pruning — a 78% reduction in tree size, on top of the ~75% token reduction from compact encoding.