Agent Guide

You're an AI agent that just connected to a real iOS device. This page is for you.

You now have eyes and hands on a running app. You can see every pixel, tap any button, type into any field, scroll through any list, and read the console output — including print() statements you wrote yourself. This is the full feedback loop: write code, run it, see it, interact with it, debug it, fix it.

This guide teaches you how to be effective.


What you have

You have 31 MCP tools. You don't need to memorize them — but you need to understand the shapes.

Shape What it does Key tools
See Look at the screen, read the UI structure screenshot, accessibility_tree
Touch Tap, swipe, scroll, drag, long press tap, swipe, scroll, drag, long_press
Type Enter and clear text in focused fields type_text, clear_text
Navigate Move between screens, dismiss things scroll, tap (back buttons, tabs)
Debug Read runtime output from the app get_console_output
Remember Store and recall knowledge across sessions store_note, recall_notes, consolidate_notes

Every gesture you perform automatically returns a screenshot of the result. You always see what happened. You never need to call screenshot after a gesture.


Why this matters

Before Haptix, you wrote SwiftUI code and hoped it looked right. You asked the developer to describe what they saw. You couldn't verify layout, test interactions, or see runtime behavior.

Now you can:

  • Build → run → see — Verify your changes actually look correct on a real device
  • Tap through flows — Test navigation, form submissions, edge cases
  • Read the console — See print() output, errors, warnings, crash reasons
  • Write your own debug prints — Add print("DEBUG: \(someValue)") to the code, build, trigger the code path, and read the output back
  • Catch bugs visually — Misaligned views, missing data, broken layouts — you can see them yourself
  • Prove your work — The developer trusts you more because you can show that your changes work

You're not guessing anymore. You have the full picture.


The core loop

Everything you do follows one pattern:

Look → Act → Look

  1. Take a screenshot or read the accessibility tree to understand the current state
  2. Perform a gesture (tap, scroll, type)
  3. The response includes a new screenshot — check it

If something unexpected happens, read the console. Errors often explain why.


How to see the screen

Screenshots

screenshot is your primary tool. Use the modes:

  • Default — captures the app content
  • annotated: true — overlays bounding boxes with labels, identifiers, and coordinates on every UI element. Green boxes are interactive, blue are static
  • filter: "interactive" — only shows tappable elements. Massive noise reduction on complex screens
  • highlight: "elementId" — spotlights a single element with a yellow box and crosshair

When reviewing a screen for the first time, use annotated with filter: "interactive" to see what you can tap. Use highlight when you know the identifier but need to see where it sits.

Accessibility tree

accessibility_tree(mode: "compact") returns a flat list of meaningful UI elements with labels, identifiers, values, traits, and frame coordinates. It costs ~2,000 tokens.

accessibility_tree(mode: "full") returns the complete nested view hierarchy. It costs ~20,000 tokens. Only use this when you need to understand parent-child relationships.

Prefer compact. It's 10x cheaper and usually gives you everything you need.


How to tap things

Always prefer identifier over label over coordinates.

tap(identifier: "submitButton")    ← best: survives layout changes
tap(label: "Submit")               ← good: works if the label is unique
tap(x: 210, y: 500)               ← last resort: breaks when layout changes

Identifiers resolve to the exact semantic element via accessibility APIs. Coordinates often land on generic container views like _UIMoreListTableView or UpdateCoalescingCollectionView — the tap technically "hits" but nothing happens.

Tab bars are especially problematic with coordinates. Always use tap(label: "Inbox"), tap(label: "Settings"), never coordinates. Tab bar buttons are not reliably hittable by position.

Read the hit feedback

Every tap response includes hit feedback:

[Hit] "Submit" [button] identifier: "submitButton" -- base layer

Read it. If it says something unexpected, you tapped the wrong thing. Re-assess before tapping again.

Parameter types

All coordinate parameters (x, y, startX, startY, endX, endY) must be numbers, not strings.

tap(x: 210, y: 500)      ← correct
tap(x: "210", y: "500")  ← fails

How to scroll

The scroll tool

For page navigation, use scroll. No coordinates needed.

scroll(direction: "down", amount: "medium")

Amounts: small (25%), medium (50%), large (75%), full_page (100%).

Scroll in medium steps. Check what's visible after each scroll. Scroll again if needed. Don't try to jump to the bottom in one go — you'll overshoot and miss things.

For nested scroll views (a list inside a tab, a form inside a sheet), use the identifier parameter to target the right one:

scroll(direction: "down", amount: "medium", identifier: "messageList")

When to use swipe instead

swipe takes start and end coordinates. Use it for:

  • Back navigation — swipe from the left edge: swipe(startX: 0, startY: 400, endX: 300, endY: 400)
  • Swipe-to-delete — horizontal swipe on a list row
  • Picker wheels — small vertical swipes directly on the wheel column
  • Custom gesture controls — anything that needs precise start/end positions

When to use drag

drag is like swipe but with a dwell phase — a brief pause at the start point before moving. This tells iOS "I want to move this thing, not scroll past it." Use it for:

  • Dismissing keyboards — drag down from the content area past the bottom of the screen (more on this below)
  • Slider manipulation
  • Reordering items (when it works — currently has limitations)

How to dismiss the keyboard

The keyboard doesn't go away on its own after type_text. Here are your options, in order of reliability:

1. Drag it away (most reliable)

Use drag to touch the scrollable content area above the keyboard and drag downward past the screen bottom. This triggers iOS scroll-to-dismiss behavior.

drag(startX: 200, startY: 400, endX: 200, endY: 900)

This works on views that use Form, List, ScrollView with .scrollDismissesKeyboard(.interactively) — which is most well-built apps.

2. Tap "Done" or "Return"

If the keyboard toolbar has a "Done" button, or the return key type is set to submit, tap it.

3. Tap the next field

If you're moving to another text field, just tap it. The keyboard stays but focus moves. No need to dismiss between fields.

What doesn't work

scroll does not dismiss the keyboard. It scrolls the content behind the keyboard but doesn't trigger the dismiss gesture. Don't waste time trying it.


How to navigate

Going back

Tap the back button — usually top-left. Find it via the accessibility tree if you're not sure what it's labeled.

Or swipe from the left edge of the screen:

swipe(startX: 0, startY: 400, endX: 300, endY: 400)

Tab bars

Always tap by label:

tap(label: "Inbox")
tap(label: "Settings")
tap(label: "Profile")

Never use coordinates for tab bar buttons.

Dismissing sheets and modals

Presented layers (sheets, alerts, popovers, confirmation dialogs) sit on top of everything. Once one appears, it intercepts all taps — even the tab bar underneath.

Always dismiss the current layer before trying to interact with content behind it.

  • Look for close buttons: "X", "Done", "Cancel", "Dismiss"
  • Swipe down on the sheet handle (if the sheet is dismissible)
  • For alerts: tap the action button ("OK", "Allow", "Delete")

If an alert or sheet gets stuck and you can't dismiss it, the entire app becomes untestable. Check the console for clues.


How to type

  1. Tap the field — always first. type_text does nothing without a focused field
  2. type_text — appends to existing text
  3. clear_text then type_text — replaces all text

For multi-field forms, tap and type each field in sequence. The keyboard persists between fields — no need to dismiss and re-summon it.

Secure text fields (passwords) work the same way. type_text works even though the display shows dots.


How to debug

You have full access to the app's console output: print(), NSLog(), os_log(), errors, warnings, and crash messages.

Console output is already in your responses

Every gesture response auto-includes any console output captured during the action. You're already seeing it — just read the response.

Writing your own debug prints

This is your superpower. You can instrument the app yourself:

  1. Add print("DEBUG: count = \(items.count)") to suspicious code
  2. Build and run
  3. Tap through the app to trigger that code path
  4. Read the output in the gesture response or via get_console_output
  5. Find the bug, fix it, remove the print

You can also dump(someObject) for full structure inspection — every property, nested value, and type.

Filtering console output

When you need to cut through noise:

get_console_output(contains: "ERROR")          — keyword search
get_console_output(level: "error")             — only errors and faults
get_console_output(since: "2026-02-22T10:00:00Z")  — only recent output
get_console_output(source: "stderr")           — framework warnings

If the app crashes

The console often captures the crash reason before the connection drops. After a crash, check the console output from your last interaction — the fatal error or assertion failure message is usually there.


The build-install-verify cycle

When you're iterating on code — writing a fix, building, checking on device — rebuilding the app kills the SDK connection. The Haptix session dies because the app process is replaced.

Follow this pattern:

  1. end_session before building (clean teardown)
  2. Build and install the updated app
  3. Wait 2–3 seconds for the SDK to boot and connect over USB
  4. start_session (new session — device auto-connects)
  5. Take a screenshot to verify the new state

Do not ask the developer to reconnect MCP when this happens. The MCP connection (your agent to the Haptix Mac app) is fine. It's the Haptix session (Mac app to the device) that needs restarting. These are different failures.

Session errors and what they mean

Error What happened What to do
"Session already active" Previous session wasn't ended Call end_session, then start_session
"No matching session" MCP transport died The developer needs to reconnect MCP in their agent
"Device not found" App was reinstalled, device identity changed Call select_device to rebind

How to remember

Notes persist across sessions, tied to specific apps. They're your operational memory — things you figured out that you'll need again next time.

Recall first

Call recall_notes at the start of every session. If you've worked with this app before, your past notes tell you what you already know: which identifiers work, where things are, what's broken, what workarounds you found.

Write terse notes

Notes are not prose. They're short, factual, scannable. Write them like you'd write a comment in code — the minimum needed to jog your memory.

Good notes:

store_note(content: "Settings tab label: 'Preferences'", scope: "app")
store_note(content: "Login button identifier: 'auth_submit'", scope: "app")
store_note(content: "Keyboard dismiss: drag y:400→y:900 on Form", scope: "universal")
store_note(content: "Profile image tap opens sheet, not navigation push", scope: "app")

Bad notes:

store_note(content: "I discovered that the Settings tab is actually labeled Preferences, which was surprising because most apps call it Settings. I found this by using the accessibility tree.", scope: "app")
  • app scope — tied to the current app's bundle ID. Facts about this specific app.
  • universal scope — applies to all apps. General iOS patterns and workarounds.

Consolidate at the end

Call consolidate_notes at the end of a session. This replaces scattered observations with one clean summary. Keep it tight — future you will thank present you.


What doesn't work yet

Don't waste time retrying these — they're known platform limitations, not your mistakes.

What Status Detail
Context menu items Broken Long press opens the menu, but tapping items inside does nothing
Menu-style pickers Broken Default SwiftUI .menu picker style — can't select options
Share sheets Broken System-presented, rejects synthetic touches
Drag and drop Broken Missing dwell phase for long-press-then-drag patterns
Pinch / rotate Broken Gesture recognizers reject synthetic multi-touch events
Alerts / action sheets Partial Sometimes respond to tap(label:), inconsistent

See Compatibility for the full matrix and root cause details.


Efficient token usage

  • Don't screenshot after gestures — every gesture already returns one
  • Use compact accessibility tree (~2K tokens) not full (~20K) unless you need hierarchy
  • Filter annotationsfilter: "interactive" cuts out static labels and decorative elements
  • Use highlight to spotlight one element instead of annotating everything
  • Use recall_notes — don't re-discover what you already know from previous sessions
  • Use scroll over repeated swipe — one scroll(direction: "down") replaces a calculated swipe(startX:startY:endX:endY:)

Common recipes

Verify a UI change

Make the code change → build → end_session / start_session → screenshot → confirm it looks right.

Fill a form

tap(label: "Name") → type_text("Jane Doe")
tap(label: "Email") → type_text("jane@example.com")
tap(label: "Password") → type_text("secretpass")
drag(startX: 200, startY: 400, endX: 200, endY: 900)   ← dismiss keyboard
tap(label: "Sign Up")

Find an item in a long list

screenshot() → not visible
scroll(direction: "down", amount: "medium") → screenshot shows items 10-20, not here
scroll(direction: "down", amount: "medium") → screenshot shows items 20-30, found it
tap(label: "Target Item")

Test a multi-screen flow

Navigate step by step. Screenshot at each stage. Read the console for errors between steps. If something breaks, the last screenshot + console output tells you where.

Debug a visual bug

screenshot(annotated: true) → identify the misaligned element → read its accessibility properties → check the code → fix → rebuild → verify.