MCP Tools Reference

Haptix exposes 22 tools through the MCP protocol. Every tool is called by your AI agent — you don't call them directly. Everything available via MCP is also available through the haptix CLI.


Tool index

Tool Category Description
start_session Session Begin an agent session (auto-discovers devices)
end_session Session End the active session
device_status Device Check connection state
device_info Device Get device model, OS, screen size
list_devices Device List all visible devices
select_device Device Bind session to a specific device
screenshot Screenshot Capture the screen
screenshot_stream_start Screenshot Start continuous streaming (USB only)
screenshot_stream_stop Screenshot Stop streaming
accessibility_tree Accessibility Get UI element hierarchy
tap Gesture Single tap
double_tap Gesture Double tap
long_press Gesture Long press (context menus)
swipe Gesture Swipe between two points
drag Gesture Drag between two points
scroll Gesture Scroll page content
pinch Gesture Pinch gesture
rotate Gesture Rotation gesture
type_text Text Input Type into focused field
clear_text Text Input Clear focused field
get_console_output Console Read app console output
set_activity Activity Show/hide activity halo

Session

start_session

Begin an agent session. This is the entry point — call it before any other tool.

start_session auto-discovers connected USB devices and auto-selects if only one is found. It returns the device name, model, screen size, and connection state. You don't need to call list_devicesselect_devicedevice_info separately — start_session handles discovery for you.

Parameter Type Required Description
name string No Session name or purpose description

Returns a session code, device info (if a device is connected), and confirmation.

end_session

End the active session. Removes the activity halo, stops recording, and returns a summary with duration and tool call count. Always call when done.

No parameters.


Device

device_status

Check connection state. Returns connection and session status.

No parameters.

device_info

Get device details: model, screen size, OS version, orientation, battery level, app bundle ID, debugger state, and UI automation status.

No parameters.

Returns JSON with: model, osVersion, screenWidth, screenHeight, scaleFactor, orientation, batteryLevel, isCharging, bundleIdentifier, appName, debuggerAttached, uiAutomationEnabled.

list_devices

List all connected USB devices. Returns device name, app name, bundle ID, model, and connection status for each.

No parameters.

Most agents don't need this — start_session auto-discovers and auto-selects devices. Use list_devices only when you have multiple devices and need to choose.

select_device

Bind this session to a specific device. All subsequent commands target this device. If not called, commands target the first available device.

Parameter Type Required Description
device string Yes Device identifier — matches service name, bundle ID, or app name (case-insensitive, partial match)

Screenshot & Observation

screenshot

Capture the screen. The primary tool for visual discovery.

Parameter Type Required Description
mode string No full (with status bar), app (default), or region
format string No png (default) or jpeg
annotated boolean No Overlay bounding boxes with labels, identifiers, and coordinates
filter string No Filter annotations: interactive, adjustable, button, text, input, image
highlight string No Spotlight one element by accessibility identifier (implies annotated)

Annotated screenshots show green boxes for interactive elements and blue boxes for static elements.

screenshot_stream_start

Start continuous screenshot streaming at a specified frame rate. Requires a USB connection.

Parameter Type Required Description
fps integer No Frames per second, 1–10. Default: 2

screenshot_stream_stop

Stop an active screenshot stream.

No parameters.

accessibility_tree

Get UI elements with labels, identifiers, values, traits, and coordinates.

Parameter Type Required Description
mode string No compact (flat list, ~2K tokens, default) or full (nested hierarchy, ~20K tokens)

Compact mode returns a flat list of meaningful elements. Full mode returns the complete nested view hierarchy. Prefer compact — it's far more token-efficient.


Gestures

All gesture tools return a multi-content response: confirmation text, a follow-up screenshot of the resulting UI state, and any console output captured during the action. No need to call screenshot after every gesture.

tap

Single tap. Prefer identifier or label over coordinates — they're more reliable and survive layout changes.

Parameter Type Required Description
identifier string No Accessibility identifier (preferred)
label string No Accessibility label (fallback)
x number No X coordinate in points (last resort)
y number No Y coordinate in points (last resort)

Includes hit feedback showing which element was tapped: [Hit] "Submit" [button] identifier: "submitButton".

double_tap

Double tap. Same parameters as tap.

long_press

Long press. Triggers context menus. Specify duration for custom hold time.

Parameter Type Required Description
identifier string No Accessibility identifier
label string No Accessibility label
x number No X coordinate
y number No Y coordinate
duration number No Hold duration in seconds. Default: 1.0

swipe

Swipe between two points. Use on background areas to scroll pages. Use on picker wheels to change values. Use on sliders to move the thumb.

Parameter Type Required Description
startX number Yes Start X coordinate
startY number Yes Start Y coordinate
endX number Yes End X coordinate
endY number Yes End Y coordinate
duration number No Duration in seconds. Default: 0.3

drag

Drag between two points (slower than swipe, with press-and-hold). Use for reordering lists, moving items, or dragging handles. For sliders, prefer swipe.

Parameter Type Required Description
startX number Yes Start X coordinate
startY number Yes Start Y coordinate
endX number Yes End X coordinate
endY number Yes End Y coordinate
holdDuration number No Dwell time at start before moving. Default: 0.15s
duration number No Movement duration. Default: 1.0s

scroll

Scroll page content in a direction. Simpler than swipe for basic navigation — no coordinates needed.

Parameter Type Required Description
direction string Yes up, down, left, or right
amount string No small (25%), medium (50%), large (75%), full_page (100%). Default: medium
identifier string No Target a specific scrollable element by identifier

pinch

Pinch gesture at a center point. scale less than 1 pinches in (zoom out), greater than 1 pinches out (zoom in).

Parameter Type Required Description
centerX number Yes Center X coordinate
centerY number Yes Center Y coordinate
scale number Yes Scale factor
duration number No Duration. Default: 0.5s

Known issue: Pinch is currently broken. Gesture recognizers reject synthetic multi-touch events. See Compatibility.

rotate

Two-finger rotation at a center point.

Parameter Type Required Description
centerX number Yes Center X coordinate
centerY number Yes Center Y coordinate
angle number Yes Rotation angle in degrees. Positive is clockwise
duration number No Duration. Default: 0.5s

Known issue: Rotate is currently broken. Same issue as pinch. See Compatibility.


Text Input

type_text

Type text into the focused input field. Tap the field first to focus it.

Parameter Type Required Description
text string Yes The text to type

clear_text

Clear all text in the focused input field.

No parameters.


Console

get_console_output

Get app console output including print(), NSLog(), os_log(), errors, and warnings. Console output is also auto-included in every gesture response, so you rarely need to call this directly.

Parameter Type Required Description
since string No ISO 8601 timestamp — only return entries after this time
limit integer No Maximum entries to return. Default: 100
contains string No Substring filter (case-insensitive)
source string No Filter by source: stdout, stderr, or os_log
level string No Filter by level: debug, info, notice, error, fault

Activity

set_activity

Show or hide the activity halo on the device. The halo auto-activates on tool use and auto-deactivates after ~10 seconds of inactivity, so you typically only need to explicitly dismiss it.

Parameter Type Required Description
active boolean Yes true to show, false to hide

During an active session (start_sessionend_session), the halo stays on and cannot be dismissed.


Key behaviors

  • Auto-screenshot: Every gesture tool returns a screenshot of the resulting UI state. No need to call screenshot after each action.
  • Hit feedback: Tap tools include which element was hit: [Hit] "Submit" [button] identifier: "submitButton" -- base layer.
  • Auto-console: Console output captured during the gesture is included in every response.
  • Coordinates: All coordinates are in screen points (not pixels), origin (0, 0) at top-left. Read them from annotated screenshots — don't estimate from image pixels.
  • Settle delay: After each gesture, the server waits 500ms for the UI to settle before capturing the follow-up screenshot.
  • Auto-discovery: start_session discovers USB devices automatically. No need for list_devicesselect_device unless you have multiple devices.