Documentation

The technical reference for GhostUI — the semantic macOS automation platform.

Architecture

GhostUI is a macOS application built on Electron + Node.js with native Swift/Objective-C bindings for low-level accessibility API access. It consists of:

  • CLI (gui) — the primary interface for all commands
  • Daemon — long-running process for leases, actors, input capture
  • Native module (ghostui_ax.node) — direct macOS Accessibility API bindings
  • Overlay renderer — native window layer for visual annotations

All commands communicate through structured payloads that flow naturally between gui subcommands via Unix pipes.

Quick Start

See what's on screen right now:

gui ax query --guiml '/Application[focused=true] { Window }'

Find and click a button:

gui ax query '/Application#Safari { AXButton#Search }' | gui cg click -

Take a screenshot of a specific element:

gui ax query '/Application#Slack { AXWindow }' | gui rec image - --out screenshot.png

GUIQL Query Language

GUIQL is GhostUI's query language for finding UI elements. It works like CSS selectors, but for the macOS accessibility tree.

Syntax

# Find an app's windows
/Application#"Google Chrome" { Window }

# Find a specific button
/Application#Slack { AXButton#"Send" }

# Attributes
/Application[focused=true] { AXTextField[value="Search"] }

# Indexing (first, second, etc.)
/Application#Keynote { Slide:0 }

# Wildcards
/Application#TextEdit { * }       # any direct child
/Application#Finder { **/AXCell } # recursive descent

System Surfaces

# OS permission prompts
/ { SystemAlert }

# Notification banners
/ { Notification }

Content Visibility

# Background apps (not frontmost)
gui ax query --show-background '/Application#Slack { Window }'

# Content behind modals
gui ax query --include-obscured '/Application#Chrome { Window }'

Payload Pipelines

GhostUI commands output structured payloads that carry geometry, role, and process information. These pipe naturally between commands — no JSON parsing needed.

# Query an element, then click it
gui ax query '/Application#Safari { AXButton#Submit }' | gui cg click -

# Query an element, highlight it, then click it
gui ax query '/Application#Safari { AXButton#Submit }' \
  | gui gfx outline - \
  | gui cg click -

# Screenshot a specific window
gui ax query '/Application#Slack { AXWindow }' | gui rec image - --out slack.png

Use --guiml for human-readable inspection. Payloads are the native format for machine consumption.

Foreground Leases

When running multi-step automations, a foreground lease ensures consistent focus, persistent cursor state, and a unified HUD across all commands.

# Acquire a lease
export GHOSTUI_LEASE=$(gui lease --foreground --name my-task)

# All subsequent gui commands share the lease
gui ax query '/Application#Chrome { AXButton#Submit }' | gui cg click -
gui cg key --app Chrome cmd+s

# Release when done
gui lease --release

Without a lease, each gui invocation independently manages focus. With a lease, the daemon coordinates everything.

Script System

.gui files are portable, replayable automation scripts. They look like shell pipelines but are constrained (no redirects, substitutions, or conditionals) and support try/finally for cleanup.

# example.gui
# Fill a login form
ax query '/Application#Chrome { AXTextField#Username }' | ax click -
cg key --app Chrome 'user@example.com'
cg key --app Chrome tab
cg key --app Chrome 'password123'
cg key --app Chrome Return
# Play a script
gui script play --file example.gui

# Resume from step 5
gui script play --file example.gui --offset 5

# Dry run (validate without executing)
gui script play --file example.gui --dry-run

Feature Grid

CapabilityCommandDescription
Query UI elementsax queryFind elements by role, label, and attributes
Raw AX treeax treeDump the full accessibility tree
Poll for elementsax pollWait for an element to appear
Apple Eventsae queryRead app-internal objects (slides, cells, files)
Keyboard inputcg keySend keystrokes and combos to any app
Mouse clickcg clickClick at coordinates or piped AX match
Dragcg dragDrag between two points or elements
Scrollcg scrollScroll at position with dx/dy
Type textcg typeFocus an element and type a string
Screenshotsrec imageCapture rect, window, or AX element
Video capturerec videoRecord screen region as video
Filmstriprec filmstripTimed grid layout captures
Overlaysgfxoutline, spotlight, arrow, caption, scan, xray
Persistent actorsactorpointer, canvas, spotlight, narrator, orb
Text-to-speechvoice speakNative macOS or Cartesia cloud TTS
Speech-to-textvoice listenOn-device or cloud transcription
Record inputinput recCapture mouse/keyboard to SQLite DB
Replay inputinput playReplay recorded actions live
Export to scriptinput scriptConvert recordings to .gui files
Window managementwindowFocus, drag, resize windows
Skill storeskillPersistent hierarchical skill storage

ax — Accessibility Queries

The primary way to inspect and interact with macOS UIs. Query by role, label, attributes, and position in the element hierarchy.

# Find all buttons in an app
gui ax query --guiml '/Application#Safari { AXButton }'

# Click the first match
gui ax query '/Application#Safari { AXButton#Back }' | gui ax press -

# Set a text field value
gui ax query '/Application#TextEdit { AXTextArea }' | gui ax set 'Hello world' -

# Wait for an element to appear (e.g., after page load)
gui ax poll --timeout 10 '/Application#Chrome { AXButton#"Sign In" }'

cg — Raw Input

Send keyboard and mouse events via CoreGraphics. Target specific apps with --app or --pid.

# Keystrokes
gui cg key --app Chrome cmd+l        # Focus address bar
gui cg key --app TextEdit cmd+shift+s # Save as

# Mouse
gui cg click 500 300 --app Chrome     # Click at coordinates
gui cg drag 100 200 400 200 --app Finder  # Drag
gui cg scroll 500 300 --dx 0 --dy -5 --app Chrome  # Scroll down

# Paste (no --app flag for paste events)
gui cg key cmd+v

ae — Apple Events

Query scriptable app objects that aren't in the accessibility tree — Keynote slide content, Finder metadata, spreadsheet cells.

# Keynote slide text
gui ae query --app Keynote --guiml 'document / slide:0 / textItem:0'

# Finder file info
gui ae query --app Finder --guiml 'folder#"Downloads" / file#"notes.txt"'

rec — Capture

Screenshot and video capture from rectangles, windows, or piped AX matches.

# Screenshot a rect
gui rec image --rect 0,0,1920,1080 --width 800 --out screenshot.png

# Screenshot a specific window
gui rec image --window 12345 --out window.png

# Screenshot an AX match
gui ax query '/Application#Slack { AXWindow }' | gui rec image - --out slack.png

# Filmstrip (timed grid)
gui rec filmstrip --rect 0,0,800,600 --grid 4x3 --duration 12 --out strip.png

gfx — Visual Overlays

One-shot visual annotations for highlighting, spotlighting, and annotating UI elements.

# Outline an element
gui ax query '/Application#Safari { AXButton#Submit }' | gui gfx outline -

# Spotlight (dim everything except target)
gui ax query '/Application#Chrome { AXTextField }' | gui gfx spotlight -

# Arrow pointing to element
gui ax query '/Application#Slack { AXButton }' | gui gfx arrow - --from 200 100

# Caption on screen
gui gfx caption 'Step 1: Click here' --duration 3

actor — Persistent Overlays

Daemon-owned actors that persist across commands. Spawn once, reuse, then kill.

# Spawn a pointer actor
gui actor spawn pointer myptr

# Move, click, narrate
gui actor run myptr move --to 500 300
gui actor run myptr click
gui actor run myptr narrate --text 'Clicking the submit button'

# Spawn a spotlight
gui actor spawn spotlight spot
gui actor run spot rect --at 400 200 --size 200 100

# Clean up
gui actor kill myptr
gui actor kill spot

voice — Speech

Text-to-speech and speech-to-text with native macOS or cloud providers.

# Speak text (native macOS)
gui voice speak 'Hello, I am GhostUI'

# Speak with Cartesia cloud voice
gui voice speak --plugin cartesia --voice  'Hello world'

# Listen for speech
gui voice listen --timeout 5000

# Speak then capture response
gui voice speak --listen 'What would you like me to do?'

input — Recording

Capture, inspect, and replay human mouse/keyboard interactions.

# Record 20 actions
gui input rec --limit 20

# Enable long-running capture (daemon-owned)
gui input enable

# Query recorded actions
gui input recent -n 10
gui input query --kind click --app Chrome

# Export to .gui script
gui input script --path > workflow.gui

# Replay
gui input query --json | gui input play -

Comparison

GhostUI AppleScript Keyboard Maestro Hammerspoon
Semantic UI queriesGUIMLPartialNoPartial
Structured payloadsNativeNoNoManual
AI-agent ready--skill pagesNoNoNo
Visual overlaysBuilt-inNoBasicManual
Record + replayDB + exportNoYesNo
VoiceTTS + STTSay onlyNoNo
Script format.gui (text).scpt (binary).kmmacros.lua
CLI-firstYesWrapperGUI-firstREPL

--skill Pages

Every gui command has a --skill flag that prints agent-facing documentation. This is how AI agents learn to use GhostUI without reading manuals.

# Get the skill page for any command
gui --skill                    # Top-level overview
gui ax query --skill           # Query language deep-dive
gui cg click --skill           # Click command specifics
gui script --skill             # Script system reference

Skill pages include syntax, examples, edge cases, and best practices — written specifically for LLM consumption.