前言
最近一个挺火的项目droidrun开源了,这个项目的亮点是通过大模型,以自然语言控制手机APP,大概是这样
Prompt: “Go to Amazon, search for headphones and write the top 3 products to my colleague on WhatsApp.”
之前我写过一篇文章分析为什么python脚本可以操作手机,今天我们透过droidrun来看下大模型(LLM)是如何控制手机的
粗粒度来看
完成任务的过程是Reasoning(推理)和Acting(行动)的循环
step1.用户把自己的目标以自然语言告诉ReAct Agent(ReAct是Reasoning and Acting的缩写)
Step2.ReAct Agent把目标、历史和可用的工具告诉大模型,让大模型进行推理如何才能完成目标。
Step3.大模型经过推理,告诉ReAct Agent要实现A目标必须先采取B动作
Step4.ReAct Agent执行B动作,并把上一轮的推理过程,以及执行B动作的结果保存到历史当中
Step5.ReAct Agent把目标(没变)、历史(更新了)和可用的工具(没变)再次传给大模型,大模型据此作出下一步的指示。
循环往复,直到某一个action执行完毕后,大模型分析执行结果发现目标已达成,整个过程就结束了。(真实的推理过程见文末的实操环节)
droidrun通过adb命令来操作安卓手机
安卓是一个开放的操作系统,它对外提供了adb,adb提供了一系列通过屏幕坐标来操作手机的命令(苹果就没有提供,所以为了保持环境轻量化,和架构简单化,很多自动化工具都是先支持安卓)。比如点击屏幕中某一个点的adb命令是
adb shell input tap x y
使用adb的好处是,功能齐全,环境还非常轻(安卓sdk自带工具),不需要额外的依赖。坏处是只能通过屏幕坐标来操作手机,而坐标不易获得,且健壮性差(换一个手机坐标很可能就变了)。
droidrun如何获取UI元素的屏幕坐标
上边说了adb命令只能通过屏幕坐标去操作手机,现在用户输入“点击登录按钮”,droidrun是如何获取登录按钮的屏幕坐标的呢?
它开发了一个代理程序(com.droidrun.portal)安装到手机上,通过adb命令发送广播和portal app通信,比如adb shell am broadcast -a com.droidrun.portal.GET_ELEMENTS,这个命令就可以获取当前页面中可点击的元素及其属性,属性中就含有元素的坐标值。实际返回结果是这样
[{‘text’: ‘登录’, ‘className’: ‘TextView’, ‘index’: 1, ‘bounds’: ‘39,302,199,501’, ‘resourceId’: ”, ‘type’: ‘clickable’, ‘isParent’: True}]
粗粒度看完了,我们补充一些技术细节
ReActAgent to LLMReasoner
- 下图是推理的过程(注意LLMReasoner的入参和返回的数据)
- LLM Provider API就是具体的大模型的API比如deepseek的API
ReActAgent如何调用工具并存储调用结果
如何通过portal app获取页面布局
最后,除了adb命令还有哪些方式可以操作手机
这篇文章谈了如何使用原生和衍生工具操作安卓和苹果手机《不用手指操作手机的N中方法》
附录
当你让Droidrun帮你用deepseek r1打开手机上的计算器app,真实的推理过程如下。
PC:~/python_venv/droidrun/bin$ droidrun “Open the calculator app” –provider deepseek
INFO:droidrun:=================================
===== Debug Response =====
“`json
{
“thought”: “I need to determine the current screen state to locate the Calculator app. First, I’ll use get_clickables to retrieve all interactive UI elements. If there’s an element labeled ‘Calculator’ or with a package name like com.android.calculator2, I can tap it directly. If not, I may need to open the app drawer or use start_app with the correct package name.”,
“action”: “get_clickables”,
“parameters”: {}
}
“`
=================================INFO:droidrun:=================================
===== Debug Response =====
“`json
{
“thought”: “The current screen shows the home screen with several apps like ‘SoloPi’, ‘Appium Settings’, etc., but the Calculator app is not visible. To open the Calculator, I need to access the app drawer. Since there is no direct ‘Apps’ or ‘Menu’ button visible, a common way to open the app drawer is by swiping up from the bottom middle of the screen. This should reveal all installed apps, including the Calculator.”,
“action”: “swipe”,
“parameters”: {
“start_x”: 360,
“start_y”: 1500,
“end_x”: 360,
“end_y”: 800,
“duration_ms”: 300
}
}
“`
=================================INFO:droidrun:=================================
===== Debug Response =====
“`json
{
“thought”: “After swiping up to open the app drawer, I need to check if the Calculator app is now visible. I’ll use get_clickables again to inspect the current UI elements. If Calculator appears in the list, I’ll tap it. If not, I may need to scroll further or check different app categories.”,
“action”: “get_clickables”,
“parameters”: {}
}
“`
=================================INFO:droidrun:=================================
===== Debug Response =====
“`json
{
“thought”: “After swiping up, the app drawer does not seem to have opened as the same UI elements are visible. The Calculator app is still not present. Since swiping up did not reveal additional apps, an alternative approach is needed. I’ll check installed packages via list_packages to find the Calculator’s package name, then launch it directly via start_app.”,
“action”: “list_packages”,
“parameters”: {
“include_system_apps”: true
}
}
“`
=================================INFO:droidrun:=================================
===== Debug Response =====
“`json
{
“thought”: “The app drawer swipe attempt didn’t reveal Calculator, but list_packages shows ‘com.android.bbkcalculator’ which appears to be the calculator app. Since direct navigation failed, I’ll launch it directly using its package name via start_app.”,
“action”: “start_app”,
“parameters”: {
“package”: “com.android.bbkcalculator”
}
}
“`
=================================