Guides
Quick Reference
One-page cheat sheet — session lifecycle, all actions, scopes, roles, and compact format at a glance.
Session lifecycle
1. Create session = Session() / await Session.create()
2. Capture session.snapshot() → compact text with element IDs
3. Find session.find(query="...") → ranked matches from cached tree
4. Act session.action("eX", "...") → ActionResult { success, message }
5. Re-capture session.snapshot() → fresh IDs (old ones are invalid)All 15 actions
| Action | Parameters | Description |
|---|---|---|
click | — | Click/invoke an element |
doubleclick | — | Double-click |
rightclick | — | Open context menu |
longpress | — | Long-press (touch/mobile) |
type | value | Type text into a field |
setvalue | value | Set value programmatically (slider, spinbutton, text) |
toggle | — | Toggle a checkbox or switch |
scroll | direction | Scroll a container (up / down / left / right) |
select | — | Select item in list, tree, or tab |
expand | — | Expand a collapsed element |
collapse | — | Collapse an expanded element |
increment | — | Increment a slider or spinbutton |
decrement | — | Decrement a slider or spinbutton |
focus | — | Move keyboard focus to element |
dismiss | — | Dismiss a dialog or popup |
Session-level (no element ID):
| Action | Parameters | Description |
|---|---|---|
press | keys | Send keyboard shortcut (ctrl+s, enter, alt+f4) |
open_app | name | Open app by name (fuzzy matched) |
Scopes
| Scope | What it captures | Speed |
|---|---|---|
overview | Window list only (titles, PIDs, bounds) | Near-instant |
foreground | Active window tree + window list header | Default, fast |
desktop | Desktop surface (icons, widgets) | Fast |
full | All windows (use app param to filter) | Slower |
Compact format syntax
Each line in compact output represents one UI element:
[id] role "name" @x,y wxh {states} [actions] val="value"| Part | Example | Meaning |
|---|---|---|
[e14] | Element ID | Use for action() calls |
button | Role | ARIA-derived role (59 total) |
"Submit" | Name | Accessible name |
@100,200 | Position | Top-left x,y in screen pixels |
300x40 | Size | Width x height |
{focused,checked} | States | Active states |
[click,toggle] | Actions | Available actions |
val="75" | Value | Current value (sliders, inputs) |
Indentation shows parent-child hierarchy. Two spaces = one level deeper.
Role categories
Common key combos for press()
| Keys | Action |
|---|---|
enter | Confirm / submit |
escape | Cancel / close |
tab | Next focus |
shift+tab | Previous focus |
ctrl+a | Select all |
ctrl+c | Copy |
ctrl+v | Paste |
ctrl+z | Undo |
ctrl+s | Save |
ctrl+w | Close tab |
alt+f4 | Close window |
ctrl+shift+p | Command palette (VS Code) |
Quick snippets
import cup
# Quick capture
print(cup.snapshot())
# Window list
print(cup.overview())
# Full session workflow
s = cup.Session()
s.snapshot()
hits = s.find(query="Save button")
if hits:
r = s.action(hits[0]["id"], "click")
print(r.success, r.message)
# Keyboard shortcut
s.press("ctrl+shift+p")
# Open app
s.open_app("calculator")
# Screenshot
png = s.screenshot()import { Session, snapshot, overview } from "computeruseprotocol";
// Quick capture
console.log(await snapshot());
// Window list
console.log(await overview());
// Full session workflow
const s = await Session.create();
await s.snapshot();
const hits = await s.find({ query: "Save button" });
if (hits.length > 0) {
const r = await s.action(hits[0].id, "click");
console.log(r.success, r.message);
}
// Keyboard shortcut
await s.press("ctrl+shift+p");
// Open app
await s.openApp("calculator");
// Screenshot
const png = await s.screenshot();