Documentation
The technical reference for GhostUI — the semantic macOS automation platform.
Architecture
GhostUI is a macOS application built on Electron + Node.js with native Swift/Objective-C bindings for low-level accessibility API access. It consists of:
- CLI (
gui) — the primary interface for all commands - Daemon — long-running process for leases, actors, input capture
- Native module (
ghostui_ax.node) — direct macOS Accessibility API bindings - Overlay renderer — native window layer for visual annotations
All commands communicate through structured payloads that flow naturally between gui subcommands via Unix pipes.
Quick Start
See what's on screen right now:
gui ax query --guiml '/Application[focused=true] { Window }'
Find and click a button:
gui ax query '/Application#Safari { AXButton#Search }' | gui cg click -
Take a screenshot of a specific element:
gui ax query '/Application#Slack { AXWindow }' | gui rec image - --out screenshot.png
GUIQL Query Language
GUIQL is GhostUI's query language for finding UI elements. It works like CSS selectors, but for the macOS accessibility tree.
Syntax
# Find an app's windows
/Application#"Google Chrome" { Window }
# Find a specific button
/Application#Slack { AXButton#"Send" }
# Attributes
/Application[focused=true] { AXTextField[value="Search"] }
# Indexing (first, second, etc.)
/Application#Keynote { Slide:0 }
# Wildcards
/Application#TextEdit { * } # any direct child
/Application#Finder { **/AXCell } # recursive descent
System Surfaces
# OS permission prompts
/ { SystemAlert }
# Notification banners
/ { Notification }
Content Visibility
# Background apps (not frontmost)
gui ax query --show-background '/Application#Slack { Window }'
# Content behind modals
gui ax query --include-obscured '/Application#Chrome { Window }'
Payload Pipelines
GhostUI commands output structured payloads that carry geometry, role, and process information. These pipe naturally between commands — no JSON parsing needed.
# Query an element, then click it
gui ax query '/Application#Safari { AXButton#Submit }' | gui cg click -
# Query an element, highlight it, then click it
gui ax query '/Application#Safari { AXButton#Submit }' \
| gui gfx outline - \
| gui cg click -
# Screenshot a specific window
gui ax query '/Application#Slack { AXWindow }' | gui rec image - --out slack.png
Use --guiml for human-readable inspection. Payloads are the native format for machine consumption.
Foreground Leases
When running multi-step automations, a foreground lease ensures consistent focus, persistent cursor state, and a unified HUD across all commands.
# Acquire a lease
export GHOSTUI_LEASE=$(gui lease --foreground --name my-task)
# All subsequent gui commands share the lease
gui ax query '/Application#Chrome { AXButton#Submit }' | gui cg click -
gui cg key --app Chrome cmd+s
# Release when done
gui lease --release
Without a lease, each gui invocation independently manages focus. With a lease, the daemon coordinates everything.
Script System
.gui files are portable, replayable automation scripts. They look like shell pipelines but are constrained (no redirects, substitutions, or conditionals) and support try/finally for cleanup.
# example.gui
# Fill a login form
ax query '/Application#Chrome { AXTextField#Username }' | ax click -
cg key --app Chrome 'user@example.com'
cg key --app Chrome tab
cg key --app Chrome 'password123'
cg key --app Chrome Return
# Play a script gui script play --file example.gui # Resume from step 5 gui script play --file example.gui --offset 5 # Dry run (validate without executing) gui script play --file example.gui --dry-run
Feature Grid
| Capability | Command | Description |
|---|---|---|
| Query UI elements | ax query | Find elements by role, label, and attributes |
| Raw AX tree | ax tree | Dump the full accessibility tree |
| Poll for elements | ax poll | Wait for an element to appear |
| Apple Events | ae query | Read app-internal objects (slides, cells, files) |
| Keyboard input | cg key | Send keystrokes and combos to any app |
| Mouse click | cg click | Click at coordinates or piped AX match |
| Drag | cg drag | Drag between two points or elements |
| Scroll | cg scroll | Scroll at position with dx/dy |
| Type text | cg type | Focus an element and type a string |
| Screenshots | rec image | Capture rect, window, or AX element |
| Video capture | rec video | Record screen region as video |
| Filmstrip | rec filmstrip | Timed grid layout captures |
| Overlays | gfx | outline, spotlight, arrow, caption, scan, xray |
| Persistent actors | actor | pointer, canvas, spotlight, narrator, orb |
| Text-to-speech | voice speak | Native macOS or Cartesia cloud TTS |
| Speech-to-text | voice listen | On-device or cloud transcription |
| Record input | input rec | Capture mouse/keyboard to SQLite DB |
| Replay input | input play | Replay recorded actions live |
| Export to script | input script | Convert recordings to .gui files |
| Window management | window | Focus, drag, resize windows |
| Skill store | skill | Persistent hierarchical skill storage |
ax — Accessibility Queries
The primary way to inspect and interact with macOS UIs. Query by role, label, attributes, and position in the element hierarchy.
# Find all buttons in an app
gui ax query --guiml '/Application#Safari { AXButton }'
# Click the first match
gui ax query '/Application#Safari { AXButton#Back }' | gui ax press -
# Set a text field value
gui ax query '/Application#TextEdit { AXTextArea }' | gui ax set 'Hello world' -
# Wait for an element to appear (e.g., after page load)
gui ax poll --timeout 10 '/Application#Chrome { AXButton#"Sign In" }'
cg — Raw Input
Send keyboard and mouse events via CoreGraphics. Target specific apps with --app or --pid.
# Keystrokes gui cg key --app Chrome cmd+l # Focus address bar gui cg key --app TextEdit cmd+shift+s # Save as # Mouse gui cg click 500 300 --app Chrome # Click at coordinates gui cg drag 100 200 400 200 --app Finder # Drag gui cg scroll 500 300 --dx 0 --dy -5 --app Chrome # Scroll down # Paste (no --app flag for paste events) gui cg key cmd+v
ae — Apple Events
Query scriptable app objects that aren't in the accessibility tree — Keynote slide content, Finder metadata, spreadsheet cells.
# Keynote slide text gui ae query --app Keynote --guiml 'document / slide:0 / textItem:0' # Finder file info gui ae query --app Finder --guiml 'folder#"Downloads" / file#"notes.txt"'
rec — Capture
Screenshot and video capture from rectangles, windows, or piped AX matches.
# Screenshot a rect
gui rec image --rect 0,0,1920,1080 --width 800 --out screenshot.png
# Screenshot a specific window
gui rec image --window 12345 --out window.png
# Screenshot an AX match
gui ax query '/Application#Slack { AXWindow }' | gui rec image - --out slack.png
# Filmstrip (timed grid)
gui rec filmstrip --rect 0,0,800,600 --grid 4x3 --duration 12 --out strip.png
gfx — Visual Overlays
One-shot visual annotations for highlighting, spotlighting, and annotating UI elements.
# Outline an element
gui ax query '/Application#Safari { AXButton#Submit }' | gui gfx outline -
# Spotlight (dim everything except target)
gui ax query '/Application#Chrome { AXTextField }' | gui gfx spotlight -
# Arrow pointing to element
gui ax query '/Application#Slack { AXButton }' | gui gfx arrow - --from 200 100
# Caption on screen
gui gfx caption 'Step 1: Click here' --duration 3
actor — Persistent Overlays
Daemon-owned actors that persist across commands. Spawn once, reuse, then kill.
# Spawn a pointer actor gui actor spawn pointer myptr # Move, click, narrate gui actor run myptr move --to 500 300 gui actor run myptr click gui actor run myptr narrate --text 'Clicking the submit button' # Spawn a spotlight gui actor spawn spotlight spot gui actor run spot rect --at 400 200 --size 200 100 # Clean up gui actor kill myptr gui actor kill spot
voice — Speech
Text-to-speech and speech-to-text with native macOS or cloud providers.
# Speak text (native macOS) gui voice speak 'Hello, I am GhostUI' # Speak with Cartesia cloud voice gui voice speak --plugin cartesia --voice'Hello world' # Listen for speech gui voice listen --timeout 5000 # Speak then capture response gui voice speak --listen 'What would you like me to do?'
input — Recording
Capture, inspect, and replay human mouse/keyboard interactions.
# Record 20 actions gui input rec --limit 20 # Enable long-running capture (daemon-owned) gui input enable # Query recorded actions gui input recent -n 10 gui input query --kind click --app Chrome # Export to .gui script gui input script --path > workflow.gui # Replay gui input query --json | gui input play -
Comparison
| GhostUI | AppleScript | Keyboard Maestro | Hammerspoon | |
|---|---|---|---|---|
| Semantic UI queries | GUIML | Partial | No | Partial |
| Structured payloads | Native | No | No | Manual |
| AI-agent ready | --skill pages | No | No | No |
| Visual overlays | Built-in | No | Basic | Manual |
| Record + replay | DB + export | No | Yes | No |
| Voice | TTS + STT | Say only | No | No |
| Script format | .gui (text) | .scpt (binary) | .kmmacros | .lua |
| CLI-first | Yes | Wrapper | GUI-first | REPL |
--skill Pages
Every gui command has a --skill flag that prints agent-facing documentation. This is how AI agents learn to use GhostUI without reading manuals.
# Get the skill page for any command gui --skill # Top-level overview gui ax query --skill # Query language deep-dive gui cg click --skill # Click command specifics gui script --skill # Script system reference
Skill pages include syntax, examples, edge cases, and best practices — written specifically for LLM consumption.