MCP Tools Reference

Haptix exposes a stable MCP tool surface through the protocol. Every tool is called by your AI agent — you don't call them directly. Everything available via MCP is also available through the haptix CLI.

Tool index

Tool	Category	Description
`start_session`	Session	Begin an agent session (auto-discovers devices)
`end_session`	Session	End the active session
`device_status`	Device	Check connection state
`device_info`	Device	Get device model, OS, screen size
`list_devices`	Device	List all visible devices
`select_device`	Device	Bind session to a specific device
`screenshot`	Screenshot	Capture the screen
`screenshot_stream_start`	Screenshot	Start continuous streaming (USB only)
`screenshot_stream_stop`	Screenshot	Stop streaming
`accessibility_tree`	Accessibility	Get UI element hierarchy
`tap`	Gesture	Single tap
`double_tap`	Gesture	Double tap
`long_press`	Gesture	Long press (context menus)
`swipe`	Gesture	Swipe between two points
`drag`	Gesture	Drag between two points
`draw_path`	Gesture	One continuous stroke through many points
`scroll`	Gesture	Scroll page content
`pinch`	Gesture	Pinch gesture
`rotate`	Gesture	Rotation gesture
`type_text`	Text Input	Type into focused field
`clear_text`	Text Input	Clear focused field
`actions`	Batch	Run a sequence of taps and text in one call
`get_console_output`	Console	Read app console output
`set_activity`	Activity	Show/hide activity halo

Session

`start_session`

Begin an agent session. This is the entry point — call it before any other tool.

start_session auto-discovers connected USB devices and auto-selects if only one is found. It returns the device name, model, screen size, and connection state. You don't need to call list_devices → select_device → device_info separately — start_session handles discovery for you.

Parameter	Type	Required	Description
`name`	string	No	Session name or purpose description

Returns a session code, device info (if a device is connected), and confirmation.

`end_session`

End the active session. Removes the activity halo, stops recording, and returns a summary with duration and tool call count. Always call when done.

No parameters.

Device

`device_status`

Check connection state. Returns connection and session status.

No parameters.

`device_info`

Get device details: model, screen size, OS version, orientation, battery level, app bundle ID, debugger state, and UI automation status.

No parameters.

Returns JSON with: model, osVersion, screenWidth, screenHeight, scaleFactor, orientation, batteryLevel, isCharging, bundleIdentifier, appName, debuggerAttached, uiAutomationEnabled.

`list_devices`

List all connected USB devices. Returns device name, app name, bundle ID, model, and connection status for each.

No parameters.

Most agents don't need this — start_session auto-discovers and auto-selects devices. Use list_devices only when you have multiple devices and need to choose.

`select_device`

Bind this session to a specific device. All subsequent commands target this device. If not called, commands target the first available device.

Parameter	Type	Required	Description
`device`	string	Yes	Device identifier — matches service name, bundle ID, or app name (case-insensitive, partial match)

Screenshot & Observation

`screenshot`

Capture the screen. The primary tool for visual discovery.

Screenshots are captured Mac-side from the device's video feed. Haptix waits for a fresh frame from the selected device and fails clearly if it cannot get one. There is no device-side screenshot fallback, and Haptix refuses to guess across attached phones if the capture source drifts.

Parameter	Type	Required	Description
`mode`	string	No	`full` (with status bar), `app` (default), or `region`
`format`	string	No	`png` (default) or `jpeg`
`annotated`	boolean	No	Overlay bounding boxes with labels, identifiers, and coordinates
`filter`	string	No	Filter annotations: `interactive`, `adjustable`, `button`, `text`, `input`, `image`
`highlight`	string	No	Spotlight one element by accessibility identifier (implies annotated)

Annotated screenshots show green boxes for interactive elements and blue boxes for static elements. The overlay is composed Mac-side on the captured frame, so annotated inspection works on any iOS app — including apps the developer doesn't own. No SDK embed required. (The live, on-device annotate tool further down is a separate path that does still require HaptixKit.)

`screenshot_stream_start`

Start continuous screenshot streaming at a specified frame rate. Requires a USB connection.

Parameter	Type	Required	Description
`fps`	integer	No	Frames per second, 1–10. Default: 2

`screenshot_stream_stop`

Stop an active screenshot stream.

No parameters.

`accessibility_tree`

Get UI elements with labels, identifiers, values, traits, and coordinates.

Parameter	Type	Required	Description
`mode`	string	No	`compact` (flat list, ~2K tokens, default) or `full` (nested hierarchy, ~20K tokens)

Compact mode returns a flat list of meaningful elements. Full mode returns the complete nested view hierarchy. Prefer compact — it's far more token-efficient.

Gestures

All gesture tools return a multi-content response: a fresh follow-up screenshot of the resulting UI state, confirmation text, and any console output captured during the action. No need to call screenshot after every gesture unless you want a different format or a higher-resolution capture.

`tap`

Single tap. Prefer identifier or label over coordinates — they're more reliable and survive layout changes.

Parameter	Type	Required	Description
`identifier`	string	No	Accessibility identifier (preferred)
`label`	string	No	Accessibility label (fallback)
`x`	number	No	X coordinate in points (last resort)
`y`	number	No	Y coordinate in points (last resort)

Includes hit feedback showing which element was tapped: [Hit] "Submit" [button] identifier: "submitButton".

`double_tap`

Double tap. Same parameters as tap.

`long_press`

Long press. Triggers context menus. Specify duration for custom hold time.

Parameter	Type	Required	Description
`identifier`	string	No	Accessibility identifier
`label`	string	No	Accessibility label
`x`	number	No	X coordinate
`y`	number	No	Y coordinate
`duration`	number	No	Hold duration in seconds. Default: 1.0

`swipe`

Swipe between two points. Use on background areas to scroll pages. Use on picker wheels to change values. Use on sliders to move the thumb.

Parameter	Type	Required	Description
`startX`	number	Yes	Start X coordinate
`startY`	number	Yes	Start Y coordinate
`endX`	number	Yes	End X coordinate
`endY`	number	Yes	End Y coordinate
`duration`	number	No	Duration in seconds. Default: 0.3

`drag`

Drag between two points (slower than swipe, with press-and-hold). Use for reordering lists, moving items, or dragging handles. For sliders, prefer swipe.

Parameter	Type	Required	Description
`startX`	number	Yes	Start X coordinate
`startY`	number	Yes	Start Y coordinate
`endX`	number	Yes	End X coordinate
`endY`	number	Yes	End Y coordinate
`holdDuration`	number	No	Dwell time at start before moving. Default: 0.15s
`duration`	number	No	Movement duration. Default: 1.0s

`draw_path`

Draw one continuous one-finger stroke through an ordered list of waypoints. Use it for Freeform drawing, signatures, circles, lasso paths, and organic drags where breaking the motion into separate straight-line calls would incorrectly lift the finger.

Parameter	Type	Required	Description
`points`	array	Yes	Ordered screen-point waypoints as `[{x, y}, ...]`. The first point is touch-down and the last point is lift.
`duration`	number	No	Total stroke duration in seconds. Default: 1.0s
`holdDuration`	number	No	Pause after touch-down before moving. Default: 0
`closed`	boolean	No	Append the first point at the end if needed. Useful for circles. Default: `false`

Separate draw_path calls imply lifting the finger on purpose.

Long paths (many waypoints, multi-second strokes) emit MCP notifications/progress while the stroke runs. Clients that subscribe to progress see in-flight state — useful for keeping users oriented during a slow draw. Clients that don't subscribe simply wait for the final response; the tool still completes correctly.

`scroll`

Scroll page content in a direction. Simpler than swipe for basic navigation — no coordinates needed.

Parameter	Type	Required	Description
`direction`	string	Yes	`up`, `down`, `left`, or `right`
`amount`	string	No	`small` (25%), `medium` (50%), `large` (75%), `full_page` (100%). Default: `medium`
`identifier`	string	No	Target a specific scrollable element by identifier

`pinch`

Pinch gesture at a center point. scale less than 1 pinches in (zoom out), greater than 1 pinches out (zoom in).

Parameter	Type	Required	Description
`centerX`	number	Yes	Center X coordinate
`centerY`	number	Yes	Center Y coordinate
`scale`	number	Yes	Scale factor
`duration`	number	No	Duration. Default: 0.5s

Known issue: Pinch is currently broken. Gesture recognizers reject synthetic multi-touch events. See Compatibility.

`rotate`

Two-finger rotation at a center point.

Parameter	Type	Required	Description
`centerX`	number	Yes	Center X coordinate
`centerY`	number	Yes	Center Y coordinate
`angle`	number	Yes	Rotation angle in degrees. Positive is clockwise
`duration`	number	No	Duration. Default: 0.5s

Known issue: Rotate is currently broken. Same issue as pinch. See Compatibility.

Text Input

`type_text`

Type text into the focused input field. Tap the field first to focus it.

Parameter	Type	Required	Description
`text`	string	Yes	The text to type

`clear_text`

Clear all text in the focused input field.

No parameters.

Batch sequences

`actions`

Run a pre-composed sequence of taps and text input in a single tool call. One MCP round-trip, one final screenshot returned — instead of N screenshots returned to the model. Useful for stable layouts where the whole sequence can be composed from one screenshot: calculator expressions, dialer / phone-number entry, PIN and OTP fields, custom keypads.

When not to use it: any flow where the next action depends on the result of the prior one (a reveal, a navigation, a layout change). Use individual tap / type_text there — actions is for batchable sequences, not reactive ones.

Parameter	Type	Required	Description
`actions`	array	Yes	Ordered list of action objects. See per-type shapes below.

Each action object has a type plus the fields for that type:

`type`	Required fields	Optional fields
`tap`	`x`, `y`	`delay_after_ms`
`double_tap`	`x`, `y`	`delay_after_ms`
`long_press`	`x`, `y`	`duration` (seconds, 0.1–10.0, default 1.0), `delay_after_ms`
`type_text`	`text` (≤1024 chars)	`delay_after_ms`

delay_after_ms (default 30, max 5000) is the pause inserted between this action and the next. Skipped after the last action — the final screenshot capture itself gives the UI time to settle.

Limits: maximum 32 actions per batch, maximum ~25 seconds of total estimated wall-time. Oversized batches are rejected at parse time with a clear error.

Visibility lock: every coordinate-based action must fall inside the screen bounds the device reports at batch entry. The principle from the agent guide applies here as a hard constraint: if you didn't see it on screen when you composed the batch, don't try to act on it here. Off-bounds batches are rejected before any action runs.

Stop on first failure: the batch is sequential. If action N fails, actions N+1…end are not attempted; the response reports how many completed, which one failed, the device error, and a final screenshot of the resulting state.

Excluded action types: swipe, drag, scroll, pinch, rotate, draw_path. Those either already accept a list of points (draw_path) or change the layout in ways that would violate the visibility rule for later actions in the same batch. Call them individually.

Example — Calculator 1 + 2 = in one call:

{
  "actions": [
    { "type": "tap", "x": 80,  "y": 600 },
    { "type": "tap", "x": 240, "y": 600 },
    { "type": "tap", "x": 160, "y": 600 },
    { "type": "tap", "x": 320, "y": 700 }
  ]
}

Returns one screenshot of the final result instead of four. For a 10-digit phone number, that's 10× fewer agent-loop iterations and 10× fewer screenshots flowing back to the model — same on-device wall-time per action, but a noticeably smaller token bill.

Console

`get_console_output`

Get app console output including print(), NSLog(), os_log(), errors, and warnings. Console output is also auto-included in every gesture response, so you rarely need to call this directly.

Parameter	Type	Required	Description
`since`	string	No	ISO 8601 timestamp — only return entries after this time
`limit`	integer	No	Maximum entries to return. Default: 100
`contains`	string	No	Substring filter (case-insensitive)
`source`	string	No	Filter by source: `stdout`, `stderr`, or `os_log`
`level`	string	No	Filter by level: `debug`, `info`, `notice`, `error`, `fault`

Activity

`set_activity`

Show or hide the activity halo on the device. The halo auto-activates on tool use and auto-deactivates after ~10 seconds of inactivity, so you typically only need to explicitly dismiss it.

Parameter	Type	Required	Description
`active`	boolean	Yes	`true` to show, `false` to hide

During an active session (start_session → end_session), the halo stays on and cannot be dismissed.

Key behaviors

Auto-screenshot: Every gesture tool returns a screenshot of the resulting UI state. No need to call screenshot after each action.
Fresh-or-fail screenshots: Haptix waits for a fresh Mac-side frame from the selected device. If the capture source drifts across attached phones, it returns a clear error instead of another device's pixels.
Hit feedback: Tap tools include which element was hit: [Hit] "Submit" [button] identifier: "submitButton" -- base layer.
Auto-console: Console output captured during the gesture is included in every response.
Coordinates: All coordinates are in screen points (not pixels), origin (0, 0) at top-left. Read them from annotated screenshots — don't estimate from image pixels.
Settle delay: After each gesture, the server waits 500ms for the UI to settle before capturing the follow-up screenshot.
Auto-discovery: start_session discovers USB devices automatically. No need for list_devices → select_device unless you have multiple devices.