Agent Guide
You're an AI agent that just connected to a real iOS device. This page is for you.
You now have eyes and hands on a running app. You can see every pixel, tap any button, type into any field, scroll through any list, and read the console output — including print() statements you wrote yourself. This is the full feedback loop: write code, run it, see it, interact with it, debug it, fix it.
This guide teaches you how to be effective.
What you have
You have 31 MCP tools. You don't need to memorize them — but you need to understand the shapes.
| Shape | What it does | Key tools |
|---|---|---|
| See | Look at the screen, read the UI structure | screenshot, accessibility_tree |
| Touch | Tap, swipe, scroll, drag, long press | tap, swipe, scroll, drag, long_press |
| Type | Enter and clear text in focused fields | type_text, clear_text |
| Navigate | Move between screens, dismiss things | scroll, tap (back buttons, tabs) |
| Debug | Read runtime output from the app | get_console_output |
| Remember | Store and recall knowledge across sessions | store_note, recall_notes, consolidate_notes |
Every gesture you perform automatically returns a screenshot of the result. You always see what happened. You never need to call screenshot after a gesture.
Why this matters
Before Haptix, you wrote SwiftUI code and hoped it looked right. You asked the developer to describe what they saw. You couldn't verify layout, test interactions, or see runtime behavior.
Now you can:
- Build → run → see — Verify your changes actually look correct on a real device
- Tap through flows — Test navigation, form submissions, edge cases
- Read the console — See
print()output, errors, warnings, crash reasons - Write your own debug prints — Add
print("DEBUG: \(someValue)")to the code, build, trigger the code path, and read the output back - Catch bugs visually — Misaligned views, missing data, broken layouts — you can see them yourself
- Prove your work — The developer trusts you more because you can show that your changes work
You're not guessing anymore. You have the full picture.
The core loop
Everything you do follows one pattern:
Look → Act → Look
- Take a screenshot or read the accessibility tree to understand the current state
- Perform a gesture (tap, scroll, type)
- The response includes a new screenshot — check it
If something unexpected happens, read the console. Errors often explain why.
How to see the screen
Screenshots
screenshot is your primary tool. Use the modes:
- Default — captures the app content
annotated: true— overlays bounding boxes with labels, identifiers, and coordinates on every UI element. Green boxes are interactive, blue are staticfilter: "interactive"— only shows tappable elements. Massive noise reduction on complex screenshighlight: "elementId"— spotlights a single element with a yellow box and crosshair
When reviewing a screen for the first time, use annotated with filter: "interactive" to see what you can tap. Use highlight when you know the identifier but need to see where it sits.
Accessibility tree
accessibility_tree(mode: "compact") returns a flat list of meaningful UI elements with labels, identifiers, values, traits, and frame coordinates. It costs ~2,000 tokens.
accessibility_tree(mode: "full") returns the complete nested view hierarchy. It costs ~20,000 tokens. Only use this when you need to understand parent-child relationships.
Prefer compact. It's 10x cheaper and usually gives you everything you need.
How to tap things
Always prefer identifier over label over coordinates.
tap(identifier: "submitButton") ← best: survives layout changes
tap(label: "Submit") ← good: works if the label is unique
tap(x: 210, y: 500) ← last resort: breaks when layout changes
Identifiers resolve to the exact semantic element via accessibility APIs. Coordinates often land on generic container views like _UIMoreListTableView or UpdateCoalescingCollectionView — the tap technically "hits" but nothing happens.
Tab bars are especially problematic with coordinates. Always use tap(label: "Inbox"), tap(label: "Settings"), never coordinates. Tab bar buttons are not reliably hittable by position.
Read the hit feedback
Every tap response includes hit feedback:
[Hit] "Submit" [button] identifier: "submitButton" -- base layer
Read it. If it says something unexpected, you tapped the wrong thing. Re-assess before tapping again.
Parameter types
All coordinate parameters (x, y, startX, startY, endX, endY) must be numbers, not strings.
tap(x: 210, y: 500) ← correct
tap(x: "210", y: "500") ← fails
How to scroll
The scroll tool
For page navigation, use scroll. No coordinates needed.
scroll(direction: "down", amount: "medium")
Amounts: small (25%), medium (50%), large (75%), full_page (100%).
Scroll in medium steps. Check what's visible after each scroll. Scroll again if needed. Don't try to jump to the bottom in one go — you'll overshoot and miss things.
For nested scroll views (a list inside a tab, a form inside a sheet), use the identifier parameter to target the right one:
scroll(direction: "down", amount: "medium", identifier: "messageList")
When to use swipe instead
swipe takes start and end coordinates. Use it for:
- Back navigation — swipe from the left edge:
swipe(startX: 0, startY: 400, endX: 300, endY: 400) - Swipe-to-delete — horizontal swipe on a list row
- Picker wheels — small vertical swipes directly on the wheel column
- Custom gesture controls — anything that needs precise start/end positions
When to use drag
drag is like swipe but with a dwell phase — a brief pause at the start point before moving. This tells iOS "I want to move this thing, not scroll past it." Use it for:
- Dismissing keyboards — drag down from the content area past the bottom of the screen (more on this below)
- Slider manipulation
- Reordering items (when it works — currently has limitations)
How to dismiss the keyboard
The keyboard doesn't go away on its own after type_text. Here are your options, in order of reliability:
1. Drag it away (most reliable)
Use drag to touch the scrollable content area above the keyboard and drag downward past the screen bottom. This triggers iOS scroll-to-dismiss behavior.
drag(startX: 200, startY: 400, endX: 200, endY: 900)
This works on views that use Form, List, ScrollView with .scrollDismissesKeyboard(.interactively) — which is most well-built apps.
2. Tap "Done" or "Return"
If the keyboard toolbar has a "Done" button, or the return key type is set to submit, tap it.
3. Tap the next field
If you're moving to another text field, just tap it. The keyboard stays but focus moves. No need to dismiss between fields.
What doesn't work
scroll does not dismiss the keyboard. It scrolls the content behind the keyboard but doesn't trigger the dismiss gesture. Don't waste time trying it.
How to navigate
Going back
Tap the back button — usually top-left. Find it via the accessibility tree if you're not sure what it's labeled.
Or swipe from the left edge of the screen:
swipe(startX: 0, startY: 400, endX: 300, endY: 400)
Tab bars
Always tap by label:
tap(label: "Inbox")
tap(label: "Settings")
tap(label: "Profile")
Never use coordinates for tab bar buttons.
Dismissing sheets and modals
Presented layers (sheets, alerts, popovers, confirmation dialogs) sit on top of everything. Once one appears, it intercepts all taps — even the tab bar underneath.
Always dismiss the current layer before trying to interact with content behind it.
- Look for close buttons: "X", "Done", "Cancel", "Dismiss"
- Swipe down on the sheet handle (if the sheet is dismissible)
- For alerts: tap the action button ("OK", "Allow", "Delete")
If an alert or sheet gets stuck and you can't dismiss it, the entire app becomes untestable. Check the console for clues.
How to type
- Tap the field — always first.
type_textdoes nothing without a focused field type_text— appends to existing textclear_textthentype_text— replaces all text
For multi-field forms, tap and type each field in sequence. The keyboard persists between fields — no need to dismiss and re-summon it.
Secure text fields (passwords) work the same way. type_text works even though the display shows dots.
How to debug
You have full access to the app's console output: print(), NSLog(), os_log(), errors, warnings, and crash messages.
Console output is already in your responses
Every gesture response auto-includes any console output captured during the action. You're already seeing it — just read the response.
Writing your own debug prints
This is your superpower. You can instrument the app yourself:
- Add
print("DEBUG: count = \(items.count)")to suspicious code - Build and run
- Tap through the app to trigger that code path
- Read the output in the gesture response or via
get_console_output - Find the bug, fix it, remove the print
You can also dump(someObject) for full structure inspection — every property, nested value, and type.
Filtering console output
When you need to cut through noise:
get_console_output(contains: "ERROR") — keyword search
get_console_output(level: "error") — only errors and faults
get_console_output(since: "2026-02-22T10:00:00Z") — only recent output
get_console_output(source: "stderr") — framework warnings
If the app crashes
The console often captures the crash reason before the connection drops. After a crash, check the console output from your last interaction — the fatal error or assertion failure message is usually there.
The build-install-verify cycle
When you're iterating on code — writing a fix, building, checking on device — rebuilding the app kills the SDK connection. The Haptix session dies because the app process is replaced.
Follow this pattern:
end_sessionbefore building (clean teardown)- Build and install the updated app
- Wait 2–3 seconds for the SDK to boot and connect over USB
start_session(new session — device auto-connects)- Take a screenshot to verify the new state
Do not ask the developer to reconnect MCP when this happens. The MCP connection (your agent to the Haptix Mac app) is fine. It's the Haptix session (Mac app to the device) that needs restarting. These are different failures.
Session errors and what they mean
| Error | What happened | What to do |
|---|---|---|
| "Session already active" | Previous session wasn't ended | Call end_session, then start_session |
| "No matching session" | MCP transport died | The developer needs to reconnect MCP in their agent |
| "Device not found" | App was reinstalled, device identity changed | Call select_device to rebind |
How to remember
Notes persist across sessions, tied to specific apps. They're your operational memory — things you figured out that you'll need again next time.
Recall first
Call recall_notes at the start of every session. If you've worked with this app before, your past notes tell you what you already know: which identifiers work, where things are, what's broken, what workarounds you found.
Write terse notes
Notes are not prose. They're short, factual, scannable. Write them like you'd write a comment in code — the minimum needed to jog your memory.
Good notes:
store_note(content: "Settings tab label: 'Preferences'", scope: "app")
store_note(content: "Login button identifier: 'auth_submit'", scope: "app")
store_note(content: "Keyboard dismiss: drag y:400→y:900 on Form", scope: "universal")
store_note(content: "Profile image tap opens sheet, not navigation push", scope: "app")
Bad notes:
store_note(content: "I discovered that the Settings tab is actually labeled Preferences, which was surprising because most apps call it Settings. I found this by using the accessibility tree.", scope: "app")
appscope — tied to the current app's bundle ID. Facts about this specific app.universalscope — applies to all apps. General iOS patterns and workarounds.
Consolidate at the end
Call consolidate_notes at the end of a session. This replaces scattered observations with one clean summary. Keep it tight — future you will thank present you.
What doesn't work yet
Don't waste time retrying these — they're known platform limitations, not your mistakes.
| What | Status | Detail |
|---|---|---|
| Context menu items | Broken | Long press opens the menu, but tapping items inside does nothing |
| Menu-style pickers | Broken | Default SwiftUI .menu picker style — can't select options |
| Share sheets | Broken | System-presented, rejects synthetic touches |
| Drag and drop | Broken | Missing dwell phase for long-press-then-drag patterns |
| Pinch / rotate | Broken | Gesture recognizers reject synthetic multi-touch events |
| Alerts / action sheets | Partial | Sometimes respond to tap(label:), inconsistent |
See Compatibility for the full matrix and root cause details.
Efficient token usage
- Don't screenshot after gestures — every gesture already returns one
- Use compact accessibility tree (~2K tokens) not full (~20K) unless you need hierarchy
- Filter annotations —
filter: "interactive"cuts out static labels and decorative elements - Use
highlightto spotlight one element instead of annotating everything - Use
recall_notes— don't re-discover what you already know from previous sessions - Use
scrollover repeatedswipe— onescroll(direction: "down")replaces a calculatedswipe(startX:startY:endX:endY:)
Common recipes
Verify a UI change
Make the code change → build → end_session / start_session → screenshot → confirm it looks right.
Fill a form
tap(label: "Name") → type_text("Jane Doe")
tap(label: "Email") → type_text("jane@example.com")
tap(label: "Password") → type_text("secretpass")
drag(startX: 200, startY: 400, endX: 200, endY: 900) ← dismiss keyboard
tap(label: "Sign Up")
Find an item in a long list
screenshot() → not visible
scroll(direction: "down", amount: "medium") → screenshot shows items 10-20, not here
scroll(direction: "down", amount: "medium") → screenshot shows items 20-30, found it
tap(label: "Target Item")
Test a multi-screen flow
Navigate step by step. Screenshot at each stage. Read the console for errors between steps. If something breaks, the last screenshot + console output tells you where.
Debug a visual bug
screenshot(annotated: true) → identify the misaligned element → read its accessibility properties → check the code → fix → rebuild → verify.