API reference (Android)
Use this doc when you need to customize Midscene's Android automation or review Android-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).
Action Space
AndroidDevice uses the following action space; the Midscene Agent can use these actions while planning tasks:
Tap— Tap an element.DoubleClick— Double-tap an element.Input— Enter text withreplace/append/clearmodes and optionalautoDismissKeyboard.Scroll— Scroll from an element or screen center in any direction, with helpers to reach the top, bottom, left, or right.DragAndDrop— Drag from one element to another.KeyboardPress— Press a specified key.LongPress— Long-press a target element with optional duration.PullGesture— Pull up or down (e.g., to refresh) with optional distance and duration.ClearInput— Clear the contents of an input field.Launch— Open a web URL orpackage/.Activitystring.RunAdbShell— Execute rawadb shellcommands.AndroidBackButton— Trigger the system back action.AndroidHomeButton— Return to the home screen.AndroidRecentAppsButton— Open the multitasking/recent apps view.
AndroidDevice
Create a connection to an adb-managed device that an AndroidAgent can drive.
Import
Constructor
Device options
deviceId: string— Value returned byadb devicesorgetConnectedDevices().autoDismissKeyboard?: boolean— Automatically hide the keyboard after input. Defaulttrue.keyboardDismissStrategy?: 'esc-first' | 'back-first'— Order for dismissing keyboards. Default'esc-first'.androidAdbPath?: string— Custom path to the adb executable.remoteAdbHost?: string/remoteAdbPort?: number— Point to a remote adb server.imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'— Choose when to invoke yadb for text input. Default'yadb-for-non-ascii'.'yadb-for-non-ascii'(default) — Uses yadb for Unicode characters (including Latin Unicode like ö, é, ñ), Chinese, Japanese, and format specifiers (like %s, %d). Pure ASCII text uses the faster nativeadb input text.'always-yadb'— Always uses yadb for all text input, providing maximum compatibility but slightly slower for pure ASCII text.
displayId?: number— Target a specific virtual display if the device mirrors multiple displays.screenshotResizeScale?: number— Downscale screenshots before sending them to the model. Defaults to1 / devicePixelRatio.alwaysRefreshScreenInfo?: boolean— Re-query rotation and screen size every step. Defaultfalse.
Usage notes
- Discover devices with
getConnectedDevices(); theudidmatchesadb devices. - Supports remote adb via
remoteAdbHost/remoteAdbPort; setandroidAdbPathif adb is not on PATH. - Use
screenshotResizeScaleto cut latency on high-DPI devices.
Examples
Quick start
Launch native packages
AndroidAgent
Wire Midscene's AI planner to an AndroidDevice for UI automation.
Import
Constructor
Android-specific options
customActions?: DeviceAction[]— Extend planning with actions defined viadefineAction.appNameMapping?: Record<string, string>— Map friendly app names to package names. When you pass an app name tolaunch(target), the agent will look up the package name in this mapping. If no mapping is found, it will attempt to launchtargetas-is. User-provided mappings take precedence over default mappings.- All other fields match API constructors:
generateReport,reportFileName,aiActionContext,modelConfig,cacheId,createOpenAIClient,onTaskStartTip, and more.
Usage notes
Info
- Use one agent per device connection.
- Android-only helpers such as
launchandrunAdbShellare also exposed in YAML scripts. See Android platform-specific actions. - For shared interaction methods, see API reference (Common).
Android-specific methods
agent.launch()
Launch a web URL or native Android activity/package.
target: string— Can be a web URL, a string inpackage/.Activityformat (e.g.,com.android.settings/.Settings), an app package name, or an app name. If you pass an app name and it exists inappNameMapping, it will be automatically resolved to the mapped package name; otherwise,targetwill be launched as-is.
agent.runAdbShell()
Run a raw adb shell command through the connected device.
command: string— Command passed verbatim toadb shell.
Navigation helpers
agent.back(): Promise<void>— Trigger the Android system Back action.agent.home(): Promise<void>— Return to the launcher.agent.recentApps(): Promise<void>— Open the Recents/Overview screen.
Helper utilities
agentFromAdbDevice()
Create an AndroidAgent from any connected adb device.
deviceId?: string— Connect to a specific device; omitted means “first available”.opts?: PageAgentOpt & AndroidDeviceOpt— Combine agent options with AndroidDevice settings.
getConnectedDevices()
Enumerate adb devices Midscene can drive.
See also
- Android getting started for setup and scripting steps.

