CUPComputer Use Protocol
SDK Reference

Session API

The Session class — CUP's primary interface for capturing UI trees and performing actions.

Overview

The Session class wraps a platform adapter and action executor. It is the main interface for both the Python and TypeScript SDKs. Convenience functions (snapshot(), overview(), etc.) use a default session internally.

Creating a session

import cup

session = cup.Session(platform=None)  # auto-detect OS
session = cup.Session(platform="web") # force web adapter
import { Session } from "computeruseprotocol";

const session = await Session.create();                        // auto-detect OS
const session = await Session.create({ platform: "web" });   // force web adapter
ParameterTypeDescription
platformstring?Force a platform adapter ("windows", "macos", "linux", "web") or None/undefined to auto-detect

Convenience functions

For quick scripts that don't need session management:

import cup

text = cup.snapshot()        # foreground window (compact text)
text = cup.snapshot("full")  # all windows
envelope = cup.snapshot_raw() # foreground window (JSON dict)
text = cup.overview()        # window list only
import { snapshot, snapshotRaw, overview } from "computeruseprotocol";

const text = await snapshot();         // foreground window (compact text)
const text = await snapshot("full");   // all windows
const envelope = await snapshotRaw();  // foreground window (JSON object)
const text = await overview();         // window list only

snapshot()

Capture the UI tree from the current platform.

result = session.snapshot(
    scope="foreground",   # overview | foreground | desktop | full
    app=None,             # window title filter (full scope only)
    max_depth=999,        # max tree depth
    compact=True,         # True = text, False = dict
    detail="compact",     # compact | full
)
const result = await session.snapshot({
  scope: "foreground",   // overview | foreground | desktop | full
  app: undefined,        // window title filter (full scope only)
  maxDepth: 999,         // max tree depth
  compact: true,         // true = text, false = object
  detail: "compact",     // compact | full
});
ParameterTypeDefaultDescription
scopestring"foreground"Capture scope. See Capture Scopes
appstring?NoneWindow title filter (only with full scope)
max_depthint999Maximum tree depth
compactbooltrueReturn compact text (true) or JSON envelope (false)
detailstring"compact""compact" applies pruning; "full" preserves all nodes

action()

Execute an action on an element from the last snapshot.

result = session.action("e14", "click")
result = session.action("e5", "type", value="hello world")
result = session.action("e9", "scroll", direction="down")
result = session.action("e3", "setvalue", value="42")
const result = await session.action("e14", "click");
const result = await session.action("e5", "type", { value: "hello world" });
const result = await session.action("e9", "scroll", { direction: "down" });
const result = await session.action("e3", "setvalue", { value: "42" });

For the full list of all 15 element-level actions (plus session-level press and wait), see the Actions Reference.

Action results

Every action returns an ActionResult:

@dataclass
class ActionResult:
    success: bool       # whether the action succeeded
    message: str        # human-readable description
    error: str | None   # error details if failed
interface ActionResult {
  success: boolean;     // whether the action succeeded
  message: string;      // human-readable description
  error?: string;       // error details if failed
}

press()

Send a keyboard shortcut to the focused window. This is a session-level action that does not target a specific element.

session.press("ctrl+s")       # save
session.press("ctrl+shift+p") # command palette
session.press("enter")        # confirm
session.press("escape")       # cancel
session.press("alt+f4")       # close window
await session.press("ctrl+s");       // save
await session.press("ctrl+shift+p"); // command palette
await session.press("enter");        // confirm
await session.press("escape");       // cancel
await session.press("alt+f4");       // close window

open_app() / openApp()

Open an application by name with fuzzy matching. Waits for the app window to appear.

session.open_app("chrome")    # fuzzy match
session.open_app("code")      # opens VS Code
session.open_app("notepad")
await session.openApp("chrome");    // fuzzy match
await session.openApp("code");      // opens VS Code
await session.openApp("notepad");

find()

Search the last captured tree for elements matching criteria. Returns results sorted by relevance.

results = session.find(query="play button")
results = session.find(role="textbox", state="focused")
results = session.find(name="Submit")
const results = session.find({ query: "play button" });
const results = session.find({ role: "textbox", state: "focused" });
const results = session.find({ name: "Submit" });
ParameterTypeDescription
querystring?Freeform semantic query (e.g., "play button")
rolestring?Role filter (supports synonyms)
namestring?Fuzzy name match
statestring?Exact state match
limitintMax results (default 5)

find() searches the last captured tree, including elements pruned from the compact output. Call snapshot() first.

batch()

Execute a sequence of actions with optional waits between them.

results = session.batch([
    {"element_id": "e3", "action": "click"},
    {"action": "wait", "ms": 500},
    {"element_id": "e7", "action": "type", "value": "hello"},
    {"action": "press", "keys": "enter"},
])
const results = await session.batch([
  { element_id: "e3", action: "click" },
  { action: "wait", ms: 500 },
  { element_id: "e7", action: "type", value: "hello" },
  { action: "press", keys: "enter" },
]);

screenshot()

Capture a screenshot of the screen as a PNG image.

png_bytes = session.screenshot()
png_bytes = session.screenshot(region={"x": 100, "y": 200, "w": 800, "h": 600})
const pngBuffer = await session.screenshot();
const pngBuffer = await session.screenshot({ region: { x: 100, y: 200, w: 800, h: 600 } });

Python requires the screenshot extra: pip install computeruseprotocol[screenshot]

On this page