API reference (Android)

Use this doc when you need to customize Midscene's Android automation or review Android-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).

Action Space

AndroidDevice uses the following action space; the Midscene Agent can use these actions while planning tasks:

  • Tap — Tap an element.
  • DoubleClick — Double-tap an element.
  • Input — Enter text with replace/append/clear modes and optional autoDismissKeyboard.
  • Scroll — Scroll from an element or screen center in any direction, with helpers to reach the top, bottom, left, or right.
  • DragAndDrop — Drag from one element to another.
  • KeyboardPress — Press a specified key.
  • LongPress — Long-press a target element with optional duration.
  • PullGesture — Pull up or down (e.g., to refresh) with optional distance and duration.
  • ClearInput — Clear the contents of an input field.
  • Launch — Open a web URL or package/.Activity string.
  • RunAdbShell — Execute raw adb shell commands.
  • AndroidBackButton — Trigger the system back action.
  • AndroidHomeButton — Return to the home screen.
  • AndroidRecentAppsButton — Open the multitasking/recent apps view.

AndroidDevice

Create a connection to an adb-managed device that an AndroidAgent can drive.

Import

import { AndroidDevice, getConnectedDevices } from '@midscene/android';

Constructor

const device = new AndroidDevice(deviceId, {
  // device options...
});

Device options

  • deviceId: string — Value returned by adb devices or getConnectedDevices().
  • autoDismissKeyboard?: boolean — Automatically hide the keyboard after input. Default true.
  • keyboardDismissStrategy?: 'esc-first' | 'back-first' — Order for dismissing keyboards. Default 'esc-first'.
  • androidAdbPath?: string — Custom path to the adb executable.
  • remoteAdbHost?: string / remoteAdbPort?: number — Point to a remote adb server.
  • imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii' — Choose when to invoke yadb for text input. Default 'yadb-for-non-ascii'.
    • 'yadb-for-non-ascii' (default) — Uses yadb for Unicode characters (including Latin Unicode like ö, é, ñ), Chinese, Japanese, and format specifiers (like %s, %d). Pure ASCII text uses the faster native adb input text.
    • 'always-yadb' — Always uses yadb for all text input, providing maximum compatibility but slightly slower for pure ASCII text.
  • displayId?: number — Target a specific virtual display if the device mirrors multiple displays.
  • screenshotResizeScale?: number — Downscale screenshots before sending them to the model. Defaults to 1 / devicePixelRatio.
  • alwaysRefreshScreenInfo?: boolean — Re-query rotation and screen size every step. Default false.

Usage notes

  • Discover devices with getConnectedDevices(); the udid matches adb devices.
  • Supports remote adb via remoteAdbHost/remoteAdbPort; set androidAdbPath if adb is not on PATH.
  • Use screenshotResizeScale to cut latency on high-DPI devices.

Examples

Quick start

import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android';

const [first] = await getConnectedDevices();
const device = new AndroidDevice(first.udid);
await device.connect();

const agent = new AndroidAgent(device, {
  aiActionContext: 'If a permissions dialog appears, accept it.',
});

await agent.launch('https://www.ebay.com');
await agent.aiAct('search "Headphones" and wait for results');
const items = await agent.aiQuery(
  '{itemTitle: string, price: number}[], find item in list and corresponding price',
);
console.log(items);

Launch native packages

await agent.launch('com.android.settings/.Settings');
await agent.back();
await agent.home();

AndroidAgent

Wire Midscene's AI planner to an AndroidDevice for UI automation.

Import

import { AndroidAgent } from '@midscene/android';

Constructor

const agent = new AndroidAgent(device, {
  // common agent options...
});

Android-specific options

  • customActions?: DeviceAction[] — Extend planning with actions defined via defineAction.
  • appNameMapping?: Record<string, string> — Map friendly app names to package names. When you pass an app name to launch(target), the agent will look up the package name in this mapping. If no mapping is found, it will attempt to launch target as-is. User-provided mappings take precedence over default mappings.
  • All other fields match API constructors: generateReport, reportFileName, aiActionContext, modelConfig, cacheId, createOpenAIClient, onTaskStartTip, and more.

Usage notes

Info

Android-specific methods

agent.launch()

Launch a web URL or native Android activity/package.

function launch(target: string): Promise<void>;
  • target: string — Can be a web URL, a string in package/.Activity format (e.g., com.android.settings/.Settings), an app package name, or an app name. If you pass an app name and it exists in appNameMapping, it will be automatically resolved to the mapped package name; otherwise, target will be launched as-is.

agent.runAdbShell()

Run a raw adb shell command through the connected device.

function runAdbShell(command: string): Promise<string>;
  • command: string — Command passed verbatim to adb shell.
const result = await agent.runAdbShell('dumpsys battery');
console.log(result);
  • agent.back(): Promise<void> — Trigger the Android system Back action.
  • agent.home(): Promise<void> — Return to the launcher.
  • agent.recentApps(): Promise<void> — Open the Recents/Overview screen.

Helper utilities

agentFromAdbDevice()

Create an AndroidAgent from any connected adb device.

function agentFromAdbDevice(
  deviceId?: string,
  opts?: PageAgentOpt & AndroidDeviceOpt,
): Promise<AndroidAgent>;
  • deviceId?: string — Connect to a specific device; omitted means “first available”.
  • opts?: PageAgentOpt & AndroidDeviceOpt — Combine agent options with AndroidDevice settings.

getConnectedDevices()

Enumerate adb devices Midscene can drive.

function getConnectedDevices(): Promise<Array<{
  udid: string;
  state: string;
  port?: number;
}>>;

See also