Performance & Optimization
Reduce latency and token usage with scope selection, depth limits, caching, and batch actions.
Choose the right scope
The scope parameter controls how much of the screen CUP captures. Choosing the smallest sufficient scope dramatically reduces both latency and token count.
| Scope | Use when... | Typical tokens |
|---|---|---|
overview | You only need to know what apps are open | ~50–100 |
foreground | You're working with the active window (most common) | ~500–2,000 |
desktop | You need desktop icons or widgets | ~200–500 |
full | You need a specific non-foreground window | Varies |
# Just need the window list? Use overview (near-instant)
windows = session.snapshot(scope="overview")
# Working with the active window? Use foreground (default)
screen = session.snapshot(scope="foreground")
# Need a background app? Filter by name
chrome = session.snapshot(scope="full", app="Chrome")const windows = await session.snapshot({ scope: "overview" });
const screen = await session.snapshot({ scope: "foreground" });
const chrome = await session.snapshot({ scope: "full", app: "Chrome" });Compact vs full detail
The detail parameter controls tree pruning. Compact mode removes noise nodes (empty groups, redundant wrappers) and reduces token usage by ~75%.
| Detail level | What you get | Best for |
|---|---|---|
compact (default) | Pruned tree — noise removed, ~75% fewer tokens | AI agent workflows, most use cases |
full | Unpruned tree — every node preserved | Debugging, finding hidden elements |
# Default — compact, great for agents
screen = session.snapshot(detail="compact")
# Full — see everything, useful for debugging
screen = session.snapshot(detail="full")const screen = await session.snapshot({ detail: "compact" });
const screen = await session.snapshot({ detail: "full" });Compact format achieves ~97% token reduction compared to raw JSON. A Spotify window snapshot is ~700 tokens in compact format vs ~25,000 as JSON.
Limit tree depth
For deeply nested UIs, use max_depth to cap how far CUP walks the tree. This reduces both capture time and output size.
# Only capture 3 levels deep
screen = session.snapshot(max_depth=3)
# Unlimited (default)
screen = session.snapshot(max_depth=999)const screen = await session.snapshot({ maxDepth: 3 });This is useful when you only need top-level navigation elements and don't care about deeply nested content.
Search the cached tree instead of re-capturing
find() searches the tree from the last snapshot() call without re-capturing. This is much faster than taking a new snapshot just to locate an element.
screen = session.snapshot()
# Fast — searches the cached tree
buttons = session.find(query="submit button")
inputs = session.find(role="textbox")
focused = session.find(state="focused")
# Slower — re-captures the entire tree
screen = session.snapshot() # only do this if UI has changedawait session.snapshot();
// Fast — searches the cached tree
const buttons = await session.find({ query: "submit button" });
const inputs = await session.find({ role: "textbox" });
const focused = await session.find({ state: "focused" });Use page() for clipped content
When a scrollable container has more items than shown in the compact output, use page() to read them from the cached tree instead of scrolling and re-capturing.
[e5] list "Results" @10,100 400x600
[e6] listitem "Result 1" ...
[e7] listitem "Result 2" ...
... 48 more items — page("e5") to see# Reads from cache — no re-capture needed
page1 = session.page("e5", direction="down")
page2 = session.page("e5", direction="down")
# Only scroll + re-capture for virtual scrolling
session.action("e5", "scroll", direction="down")
screen = session.snapshot()const page1 = await session.page("e5", { direction: "down" });
const page2 = await session.page("e5", { direction: "down" });Batch actions to reduce round-trips
Use batch() to execute multiple actions in a single call instead of individual action + snapshot cycles:
# Instead of:
session.action("e5", "click")
session.action("e3", "type", value="hello")
session.press("tab")
session.action("e7", "type", value="world")
# Use batch:
results = session.batch([
{"element_id": "e5", "action": "click"},
{"element_id": "e3", "action": "type", "value": "hello"},
{"action": "press", "keys": "tab"},
{"element_id": "e7", "action": "type", "value": "world"},
])
# Check results — stops on first failure
for r in results:
if not r.success:
print(f"Failed: {r.error}")
breakconst results = await session.batch([
{ element_id: "e5", action: "click" },
{ element_id: "e3", action: "type", value: "hello" },
{ action: "press", keys: "tab" },
{ element_id: "e7", action: "type", value: "world" },
]);
for (const r of results) {
if (!r.success) {
console.error(`Failed: ${r.error}`);
break;
}
}Token budget planning for LLM agents
When integrating CUP with LLMs (via MCP or direct API calls), token usage matters. Here's a rough guide:
| Scenario | Typical tokens | Strategy |
|---|---|---|
| Window list only | 50–100 | Use overview scope |
| Simple app (Calculator, Notepad) | 200–500 | Default foreground + compact |
| Medium app (Settings, File manager) | 500–1,500 | Default settings |
| Complex app (IDE, Browser) | 1,500–3,000 | Limit depth with max_depth=5 |
| Very complex app (Excel with data) | 3,000–5,000+ | Limit depth + use find() to narrow |
Tips for LLM integration:
- Start with
overviewto decide which app to target - Use
foregroundscope (notfull) unless you need a background app - Keep
detail="compact"(default) — full detail wastes tokens on noise - Use
find()to search instead of asking the LLM to parse the entire tree - Use
page()for long lists instead of including everything in context
What's next?
- Quick reference — all actions, scopes, and roles at a glance
- Compact format — how pruning and serialization work
- Scopes — detailed scope documentation