MCP Tools Reference
Haptix exposes 22 tools through the MCP protocol. Every tool is called by your AI agent — you don't call them directly. Everything available via MCP is also available through the haptix CLI.
Tool index
| Tool | Category | Description |
|---|---|---|
start_session |
Session | Begin an agent session (auto-discovers devices) |
end_session |
Session | End the active session |
device_status |
Device | Check connection state |
device_info |
Device | Get device model, OS, screen size |
list_devices |
Device | List all visible devices |
select_device |
Device | Bind session to a specific device |
screenshot |
Screenshot | Capture the screen |
screenshot_stream_start |
Screenshot | Start continuous streaming (USB only) |
screenshot_stream_stop |
Screenshot | Stop streaming |
accessibility_tree |
Accessibility | Get UI element hierarchy |
tap |
Gesture | Single tap |
double_tap |
Gesture | Double tap |
long_press |
Gesture | Long press (context menus) |
swipe |
Gesture | Swipe between two points |
drag |
Gesture | Drag between two points |
scroll |
Gesture | Scroll page content |
pinch |
Gesture | Pinch gesture |
rotate |
Gesture | Rotation gesture |
type_text |
Text Input | Type into focused field |
clear_text |
Text Input | Clear focused field |
get_console_output |
Console | Read app console output |
set_activity |
Activity | Show/hide activity halo |
Session
start_session
Begin an agent session. This is the entry point — call it before any other tool.
start_session auto-discovers connected USB devices and auto-selects if only one is found. It returns the device name, model, screen size, and connection state. You don't need to call list_devices → select_device → device_info separately — start_session handles discovery for you.
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
string | No | Session name or purpose description |
Returns a session code, device info (if a device is connected), and confirmation.
end_session
End the active session. Removes the activity halo, stops recording, and returns a summary with duration and tool call count. Always call when done.
No parameters.
Device
device_status
Check connection state. Returns connection and session status.
No parameters.
device_info
Get device details: model, screen size, OS version, orientation, battery level, app bundle ID, debugger state, and UI automation status.
No parameters.
Returns JSON with: model, osVersion, screenWidth, screenHeight, scaleFactor, orientation, batteryLevel, isCharging, bundleIdentifier, appName, debuggerAttached, uiAutomationEnabled.
list_devices
List all connected USB devices. Returns device name, app name, bundle ID, model, and connection status for each.
No parameters.
Most agents don't need this —
start_sessionauto-discovers and auto-selects devices. Uselist_devicesonly when you have multiple devices and need to choose.
select_device
Bind this session to a specific device. All subsequent commands target this device. If not called, commands target the first available device.
| Parameter | Type | Required | Description |
|---|---|---|---|
device |
string | Yes | Device identifier — matches service name, bundle ID, or app name (case-insensitive, partial match) |
Screenshot & Observation
screenshot
Capture the screen. The primary tool for visual discovery.
| Parameter | Type | Required | Description |
|---|---|---|---|
mode |
string | No | full (with status bar), app (default), or region |
format |
string | No | png (default) or jpeg |
annotated |
boolean | No | Overlay bounding boxes with labels, identifiers, and coordinates |
filter |
string | No | Filter annotations: interactive, adjustable, button, text, input, image |
highlight |
string | No | Spotlight one element by accessibility identifier (implies annotated) |
Annotated screenshots show green boxes for interactive elements and blue boxes for static elements.
screenshot_stream_start
Start continuous screenshot streaming at a specified frame rate. Requires a USB connection.
| Parameter | Type | Required | Description |
|---|---|---|---|
fps |
integer | No | Frames per second, 1–10. Default: 2 |
screenshot_stream_stop
Stop an active screenshot stream.
No parameters.
accessibility_tree
Get UI elements with labels, identifiers, values, traits, and coordinates.
| Parameter | Type | Required | Description |
|---|---|---|---|
mode |
string | No | compact (flat list, ~2K tokens, default) or full (nested hierarchy, ~20K tokens) |
Compact mode returns a flat list of meaningful elements. Full mode returns the complete nested view hierarchy. Prefer compact — it's far more token-efficient.
Gestures
All gesture tools return a multi-content response: confirmation text, a follow-up screenshot of the resulting UI state, and any console output captured during the action. No need to call screenshot after every gesture.
tap
Single tap. Prefer identifier or label over coordinates — they're more reliable and survive layout changes.
| Parameter | Type | Required | Description |
|---|---|---|---|
identifier |
string | No | Accessibility identifier (preferred) |
label |
string | No | Accessibility label (fallback) |
x |
number | No | X coordinate in points (last resort) |
y |
number | No | Y coordinate in points (last resort) |
Includes hit feedback showing which element was tapped: [Hit] "Submit" [button] identifier: "submitButton".
double_tap
Double tap. Same parameters as tap.
long_press
Long press. Triggers context menus. Specify duration for custom hold time.
| Parameter | Type | Required | Description |
|---|---|---|---|
identifier |
string | No | Accessibility identifier |
label |
string | No | Accessibility label |
x |
number | No | X coordinate |
y |
number | No | Y coordinate |
duration |
number | No | Hold duration in seconds. Default: 1.0 |
swipe
Swipe between two points. Use on background areas to scroll pages. Use on picker wheels to change values. Use on sliders to move the thumb.
| Parameter | Type | Required | Description |
|---|---|---|---|
startX |
number | Yes | Start X coordinate |
startY |
number | Yes | Start Y coordinate |
endX |
number | Yes | End X coordinate |
endY |
number | Yes | End Y coordinate |
duration |
number | No | Duration in seconds. Default: 0.3 |
drag
Drag between two points (slower than swipe, with press-and-hold). Use for reordering lists, moving items, or dragging handles. For sliders, prefer swipe.
| Parameter | Type | Required | Description |
|---|---|---|---|
startX |
number | Yes | Start X coordinate |
startY |
number | Yes | Start Y coordinate |
endX |
number | Yes | End X coordinate |
endY |
number | Yes | End Y coordinate |
holdDuration |
number | No | Dwell time at start before moving. Default: 0.15s |
duration |
number | No | Movement duration. Default: 1.0s |
scroll
Scroll page content in a direction. Simpler than swipe for basic navigation — no coordinates needed.
| Parameter | Type | Required | Description |
|---|---|---|---|
direction |
string | Yes | up, down, left, or right |
amount |
string | No | small (25%), medium (50%), large (75%), full_page (100%). Default: medium |
identifier |
string | No | Target a specific scrollable element by identifier |
pinch
Pinch gesture at a center point. scale less than 1 pinches in (zoom out), greater than 1 pinches out (zoom in).
| Parameter | Type | Required | Description |
|---|---|---|---|
centerX |
number | Yes | Center X coordinate |
centerY |
number | Yes | Center Y coordinate |
scale |
number | Yes | Scale factor |
duration |
number | No | Duration. Default: 0.5s |
Known issue: Pinch is currently broken. Gesture recognizers reject synthetic multi-touch events. See Compatibility.
rotate
Two-finger rotation at a center point.
| Parameter | Type | Required | Description |
|---|---|---|---|
centerX |
number | Yes | Center X coordinate |
centerY |
number | Yes | Center Y coordinate |
angle |
number | Yes | Rotation angle in degrees. Positive is clockwise |
duration |
number | No | Duration. Default: 0.5s |
Known issue: Rotate is currently broken. Same issue as pinch. See Compatibility.
Text Input
type_text
Type text into the focused input field. Tap the field first to focus it.
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | The text to type |
clear_text
Clear all text in the focused input field.
No parameters.
Console
get_console_output
Get app console output including print(), NSLog(), os_log(), errors, and warnings. Console output is also auto-included in every gesture response, so you rarely need to call this directly.
| Parameter | Type | Required | Description |
|---|---|---|---|
since |
string | No | ISO 8601 timestamp — only return entries after this time |
limit |
integer | No | Maximum entries to return. Default: 100 |
contains |
string | No | Substring filter (case-insensitive) |
source |
string | No | Filter by source: stdout, stderr, or os_log |
level |
string | No | Filter by level: debug, info, notice, error, fault |
Activity
set_activity
Show or hide the activity halo on the device. The halo auto-activates on tool use and auto-deactivates after ~10 seconds of inactivity, so you typically only need to explicitly dismiss it.
| Parameter | Type | Required | Description |
|---|---|---|---|
active |
boolean | Yes | true to show, false to hide |
During an active session (start_session → end_session), the halo stays on and cannot be dismissed.
Key behaviors
- Auto-screenshot: Every gesture tool returns a screenshot of the resulting UI state. No need to call
screenshotafter each action. - Hit feedback: Tap tools include which element was hit:
[Hit] "Submit" [button] identifier: "submitButton" -- base layer. - Auto-console: Console output captured during the gesture is included in every response.
- Coordinates: All coordinates are in screen points (not pixels), origin
(0, 0)at top-left. Read them from annotated screenshots — don't estimate from image pixels. - Settle delay: After each gesture, the server waits 500ms for the UI to settle before capturing the follow-up screenshot.
- Auto-discovery:
start_sessiondiscovers USB devices automatically. No need forlist_devices→select_deviceunless you have multiple devices.