android-action-kernel is an open source Python library designed to let AI agents control and automate native Android applications running on real devices or emulators. It fills a gap in automation tooling by focusing on mobile-first workflows where traditional browser or desktop-based automation doesn’t work; such as logistics, gig work, field operations, and other industries reliant on phones or tablets. The project works by using Android’s accessibility API to extract structured UI state (as XML) from the device, which is then fed to a large language model (LLM) like OpenAI’s models for decision-making, and actions are executed via the Android Debug Bridge (ADB). This approach bypasses expensive vision-based models and provides faster, cheaper automation with fine-grained interaction capabilities (for example, tapping buttons, typing text, navigating screens).
Features
- Enables AI agents to control native Android apps via accessibility API
- Uses structured UI data instead of screenshots or visual models for decision-making
- Integrates with large language models (LLMs) for intelligent task planning
- Executes actions through ADB commands (tap, type, navigate, etc.)
- Minimal core codebase for easy extension and understanding
- Designed for real workflows (logistics, gig economy, mobile automation)