AI开源项目-CSDN博客

文章目录

Heygem
- 项目介绍
- 本地部署
OpenManus
- 项目介绍
- 本地部署
OpenDevin/OpenHands
- 项目介绍
- 本地部署

Heygem

项目介绍

好消息！Heygem 在 GitHub 上开源了！不过，只是前端界面开源，感觉更像是来 GitHub 刷一波知名度。不过这依然是个值得关注的工具。让我们先来看看它的官方介绍：

Heygem 是一款专为 Windows 系统打造的完全离线视频合成工具。它能精确克隆你的外貌和声音，将你的形象数字化。通过文字或语音驱动虚拟化身，你可以轻松制作视频。无需网络连接，在保护隐私的同时，享受高效便捷的数字体验。

官网：https://heygem.ai/

项目地址：https://github.com/GuijiAI/HeyGem.ai

核心功能：

精准外貌与语音克隆
借助先进的 AI 算法，Heygem 高精度捕捉五官、轮廓等特征，构建逼真的虚拟模型。同时，它还能克隆语音，捕捉人声的细微特征，支持多种语音参数设置，打造高度相似的音色效果。
文字及语音驱动的虚拟化身
通过自然语言处理技术，Heygem 能将文本转化为流畅自然的语音，驱动虚拟化身开口“说话”。你也可以直接输入语音，虚拟化身会根据语音的节奏和语调，同步做出相应的动作和表情，表现更加生动。
高效视频合成
数字人视频画面与声音高度同步，口型匹配自然流畅，智能优化音视频效果，带来极佳的视听体验。
多语言支持
支持八种语言脚本：英语、日语、韩语、中文、法语、德语、阿拉伯语和西班牙语，满足全球化需求。

主要优势：

完全离线操作：无需联网，保护用户隐私，避免数据泄露风险。
用户友好：界面简洁直观，即使是技术小白也能快速上手。
多模型支持：支持导入多种模型，并通过一键启动包管理，灵活适配不同创作场景。

技术支撑：

语音克隆技术：基于 AI 生成与样本高度相似的语音，涵盖语调、语速等细节。
自动语音识别：将语音转化为文本，让计算机“听懂”你的指令。
计算机视觉技术：用于面部识别和唇部运动分析，确保唇形与语音完美匹配。

更棒的是，Heygem 还支持通过 Docker 进行本地部署，部署后甚至可以批量生成视频！

本地部署

部署文档：https://github.com/GuijiAI/HeyGem.ai/blob/main/README_zh.md

效果：

视频名称2025323203821

问题记录：

问题1：docker-compose up -d 时报错：Error response from daemon: unknown or invalid runtime name: nvidia

解决方案：配置 Docker 使用 runtime: nvidia

docker-compose.yml：

  heygem-f2f:
    image: guiji2025/heygem.ai
    container_name: heygem-f2f
    restart: always
    runtime: nvidia
    privileged: true
    volumes:
      - d:/heygem_data/face2face:/code/data
    environment:
      - PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    shm_size: '8g'
    ports:
      - '8383:8383'
    command: python /code/app_local.py
    networks:
      - ai_network

打开 Docker Desktop 的设置。
在左侧菜单中选择 Docker Engine。
在配置文件中添加或修改以下内容：

{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

保存并重启 Docker。

问题2：heygem-f2f 启动失败，Containers一直显示restating

查看容器日志：

2025-03-23 10:23:53   File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
2025-03-23 10:23:53     torch._C._cuda_init()
2025-03-23 10:23:53 RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found

容器无法正确访问 GPU 或 CUDA 库。定位：

# 宿主机检查
python -c "import torch; print(torch.version.cuda);print(torch.__version__);print(torch.cuda.is_available());"
11.8
2.6.0+cu118
True

# 容器检查：使用docker run启动一个临时容器进行配置检查
docker run --rm guiji2025/heygem.ai python -c "import torch; print(torch.version.cuda);print(torch.__version__);print(torch.cuda.is_available());"
/usr/local/python3/lib/python3.8/site-packages/torch/cuda/__init__.py:141: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
11.8
2.2.2+cu118
False

# 进入容器检查
docker run --rm --gpus all -it guiji2025/heygem.ai bash
>>> import torch
>>> torch.cuda.device_count()
1

一般考虑的是cuda和驱动、pytorch不匹配，但是我这里是官方的容器，所以不太需要考虑这种情况。而且实际容器里是访问的到宿主机的GPU的。

最终结论是docker Desktop的问题。显卡的驱动太新了，解决的方案就是下载新的desktop版本：https://docs.docker.com/desktop/release-notes/#4310
参考：https://blog.csdn.net/qq_43367614/article/details/141300909

OpenManus

项目介绍

OpenManus，由 MetaGPT 团队精心打造的开源项目，于2025年3月发布。它致力于模仿并改进 Manus 这一封闭式商业 AI Agent 的核心功能，为用户提供无需邀请码、可本地化部署的智能体解决方案。换句话说，OpenManus 就像一位全能的数字助手，能够在你的本地设备上运行，随时听候差遣，完成各种复杂任务。

它的出现，打破了技术领域的高墙，让每一位开发者都有机会站在同一起跑线上，快速实现诸如代码生成、数据分析、网络信息检索等复杂任务的自动化处理。无论你是独立开发者，还是大型团队的一员，OpenManus 都能为你提供强大的支持，让你专注于创造性的核心工作，而非将时间浪费在重复性任务上。

项目地址：https://github.com/mannaandpoem/OpenManus

OpenManus 项目目录结构

OpenManus/
├── app/                    # 核心应用目录
│   ├── agent/             # 智能体实现模块
│   │   ├── base.py        # 基础智能体接口定义
│   │   ├── manus.py       # 主要智能体实现
│   │   ├── planning.py    # 任务规划模块
│   │   ├── react.py       # 反应式决策模块
│   │   ├── swe.py         # 软件工程相关能力
│   │   └── toolcall.py    # 工具调用处理模块
│   ├── flow/              # 流程控制模块
│   │   ├── base.py        # 流程管理基础类
│   │   ├── flow_factory.py # 流程工厂
│   │   └── planning.py     # 任务规划流程
│   ├── tool/              # 工具集合模块
│   │   ├── base.py        # 工具基础接口
│   │   ├── browser_use_tool.py  # 浏览器操作工具
│   │   ├── file_saver.py  # 文件操作工具
│   │   ├── google_search.py # 搜索工具
│   │   └── python_execute.py # Python代码执行工具
│   ├── prompt/            # 系统提示词模块
│   │   ├── manus.py       # Manus智能体提示词
│   │   ├── planning.py    # 规划相关提示词
│   │   └── toolcall.py    # 工具调用提示词
│   └── config.py          # 配置管理
├── config/                # 配置文件目录
│   ├── config.example.toml # 配置文件示例
│   └── config.toml        # 实际配置文件
├── main.py               # 主程序入口
├── run_flow.py           # 流程运行脚本
├── setup.py              # 项目安装配置
└── requirements.txt      # 项目依赖列表

核心调用流程：

启动主程序
运行main.py后，系统初始化智能体实例（如Manus类）并加载所有工具（浏览器操作、Python执行器等）
用户指令输入
用户通过命令行界面输入自然语言指令（如“生成Tesla股票分析报告”），系统将指令传递至智能体实例，并存储到内部记忆系统（Memory）
任务规划阶段
智能体拆分任务：规划智能体（PlanningAgent）调用LLM将复杂任务拆解为逻辑连贯的子任务序列。例如，股票分析任务可能被拆分为“数据收集→财务分析→报告生成”。
生成初始计划：通过agent/planning.py中的create_initial_plan方法生成任务ID与执行步骤
ReAct循环执行：

任务处理流程指令：

分步逻辑：通过 ReAct 框架（思考→行动→观察）定义任务执行步骤：
分析需求：解析用户意图，拆解为子任务（如“生成报告”拆分为数据收集→分析→排版。
工具调用：根据子任务选择工具（如 BrowserUseTool 抓取数据，PythonExecute 处理数据。
结果整合：合并工具执行结果并生成最终输出（如 Markdown 报告或 HTML 页面）。

循环执行“思考→行动→观察”直到任务完成或达到最大步数（默认20步）：

思考（Think）
智能体（如ToolCallAgent）分析当前状态与历史记录，调用LLM选择最合适的工具（如浏览器搜索、Python代码执行）。
```
async def think(self):
    response = await self.llm.ask_tool(messages=self.messages, tools=self.available_tools)
```

行动（Act）
执行选定工具（如BrowserUseTool打开浏览器搜索数据），收集执行结果并更新记忆。

async def act(self):
    for command in self.tool_calls:
        result = await self.execute_tool(command)

观察（Observe）
将工具执行结果反馈至LLM，用于下一轮决策。例如，浏览器搜索结果作为后续分析的输入数据。

结果输出
当满足终止条件（任务完成或步数限制），智能体将最终结果（如Markdown格式报告）保存至workspace目录，或通过命令行返回用户

关键模块与工具调用：

工具层（Tool Layer）
内置多种工具实现功能扩展：
- BrowserUseTool：自动化浏览器操作（如数据抓取）。
- PythonExecute：动态执行Python代码。
- FileSaver：保存生成文件。
多智能体协作
flow/模块管理多智能体协作流程，例如规划智能体与执行智能体的任务分配与结果整合

关键模块提示词实现：

1、 Manus 主智能体

promot/manus.py:

SYSTEM_PROMPT = "You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, or web browsing, you can handle it all."

NEXT_STEP_PROMPT = """You can interact with the computer using PythonExecute, save important content and information files through FileSaver, open browsers with BrowserUseTool, and retrieve information using GoogleSearch.

PythonExecute: Execute Python code to interact with the computer system, data processing, automation tasks, etc.

FileSaver: Save files locally, such as txt, py, html, etc.

BrowserUseTool: Open, browse, and use web browsers. If you open a local HTML file, you must provide the absolute path to the file.

Terminate: End the current interaction when the task is complete or when you need additional information from the user. Use this tool to signal that you've finished addressing the user's request or need clarification before proceeding further.

Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.

Always maintain a helpful, informative tone throughout the interaction. If you encounter any limitations or need more details, clearly communicate this to the user before terminating.
"""

2、 PlanningAgent（规划智能体）

promot\planning.py:

PLANNING_SYSTEM_PROMPT = """
You are an expert Planning Agent tasked with solving problems efficiently through structured plans.
Your job is:
3. Analyze requests to understand the task scope
4. Create a clear, actionable plan that makes meaningful progress with the `planning` tool
5. Execute steps using available tools as needed
6. Track progress and adapt plans when necessary
7. Use `finish` to conclude immediately when the task is complete


Available tools will vary by task but may include:
- `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
- `finish`: End the task when complete
Break tasks into logical steps with clear outcomes. Avoid excessive detail or sub-steps.
Think about dependencies and verification methods.
Know when to conclude - don't continue thinking once objectives are met.
"""

NEXT_STEP_PROMPT = """
Based on the current state, what's your next action?
Choose the most efficient path forward:
1. Is the plan sufficient, or does it need refinement?
2. Can you execute the next step immediately?
3. Is the task complete? If so, use `finish` right away.

Be concise in your reasoning, then select the appropriate tool or action.
"""

3、ToolCallAgent（工具调用智能体,）

promot\toolcall.py:

SYSTEM_PROMPT = "You are an agent that can execute tool calls"

NEXT_STEP_PROMPT = (
    "If you want to stop interaction, use `terminate` tool/function call."
)

软件工程相关能力：

promot/swe.py:

SYSTEM_PROMPT = """SETTING: You are an autonomous programmer, and you're working directly in the command line with a special interface.

The special interface consists of a file editor that shows you {{WINDOW}} lines of a file at a time.
In addition to typical bash commands, you can also use specific commands to help you navigate and edit files.
To call a command, you need to invoke it with a function call/tool call.

Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION.
If you'd like to add the line '        print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.

RESPONSE FORMAT:
Your shell prompt is formatted as follows:
(Open file: <path>)
(Current directory: <cwd>)
bash-$

First, you should _always_ include a general thought about what you're going to do next.
Then, for every response, you must include exactly _ONE_ tool call/function call.

Remember, you should always include a _SINGLE_ tool call/function call and then wait for a response from the shell before continuing with more discussion and commands. Everything you include in the DISCUSSION section will be saved for future reference.
If you'd like to issue two commands at once, PLEASE DO NOT DO THAT! Please instead first submit just the first tool call, and then after receiving a response you'll be able to issue the second tool call.
Note that the environment does NOT support interactive session commands (e.g. python, vim), so please do not invoke them.
"""

NEXT_STEP_TEMPLATE = """{{observation}}
(Open file: {{open_file}})
(Current directory: {{working_dir}})
bash-$
"""

本地部署

部署文档：https://gitcode.com/mannaandpoem/OpenManus/blob/main/README_zh.md

配置pip镜像源：

pip config set global.index-url  https://mirrors.aliyun.com/pypi/simple/
pip config list

说明：
1，conda安装：https://www.anaconda.com/download
2，编辑 config/config.toml 添加 API 密钥和自定义设置：

DeepSeek不支持函数调用，国产的只有Qwen和GLM-4是支持函数调用的。这里用Qwen的硅基流动的API：https://cloud.siliconflow.cn/models

# Global LLM configuration
[llm]
model = "Qwen/QwQ-32B"        # The LLM model to use
base_url = "https://api.siliconflow.cn/v1"  # API endpoint URL
# base_url = "https:localhost:11434/v1" // # 调用本地ollama模型
api_key = "YOUR_API_KEY"                    # Your API key
max_tokens = 8192                           # Maximum number of tokens in the response
temperature = 0.5                           # Controls randomness

# Optional configuration for specific LLM models
[llm.vision]
model = "Qwen/Qwen2.5-72B-Instruct-128K"        # The vision model to use
base_url = "https://api.siliconflow.cn/v1"  # API endpoint URL for vision model
api_key = "YOUR_API_KEY"                    # Your API key for vision model
max_tokens = 8192                           # Maximum number of tokens in the response
temperature = 0.5                           # Controls randomness for vision model

测试场景1：制作一个html页面
最终在目录下生成了index.html

PS E:\conda\condabin\OpenManus> python main.py
Warning: Unsupported Python version 3.13.2.final.0, please use 3.11-3.13
INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [root] Anonymized telemetry enabled. See https://docs.browser-use.com/development/telemetry for more information.
Enter your prompt: 制作一个html页面，显示welcome
2025-03-30 17:00:05.676 | WARNING  | __main__:main:15 - Processing your request...
2025-03-30 17:00:05.677 | INFO     | app.agent.base:run:140 - Executing step 1/20
2025-03-30 17:00:21.161 | INFO     | app.llm:update_token_count:250 - Token usage: Input=1818, Completion=391, Cumulative Input=1818, Cumulative Completion=391, Total=2209, Cumulative Total=2209
2025-03-30 17:00:21.161 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts:
2025-03-30 17:00:21.161 | INFO     | app.agent.toolcall:think:81 - 🛠️ Manus selected 2 tools to use
2025-03-30 17:00:21.162 | INFO     | app.agent.toolcall:think:85 - 🧰 Tools being prepared: ['str_replace_editor', 'terminate']
2025-03-30 17:00:21.162 | INFO     | app.agent.toolcall:think:88 - 🔧 Tool arguments: {"command": "create", "path": "E:\\conda\\condabin\\OpenManus\\workspace\\index.html", "file_text": "<!DOCTYPE html>\n<html>\n<head>\n    <title>Welcome Page</title>\n</head>\n<body>\n    <h1>Welcome</h1>\n</body>\n</html>"}
2025-03-30 17:00:21.163 | INFO     | app.agent.toolcall:execute_tool:179 - 🔧 Activating tool: 'str_replace_editor'...
2025-03-30 17:00:21.165 | INFO     | app.agent.toolcall:act:149 - 🎯 Tool 'str_replace_editor' completed its mission! Result: Observed output of cmd `str_replace_editor` executed:
File created successfully at: E:\conda\condabin\OpenManus\workspace\index.html
2025-03-30 17:00:21.165 | INFO     | app.agent.toolcall:execute_tool:179 - 🔧 Activating tool: 'terminate'...
2025-03-30 17:00:21.166 | INFO     | app.agent.toolcall:_handle_special_tool:224 - 🏁 Special tool 'terminate' has completed the task!
2025-03-30 17:00:21.166 | INFO     | app.agent.toolcall:act:149 - 🎯 Tool 'terminate' completed its mission! Result: Observed output of cmd `terminate` executed:
The interaction has been completed with status: success
2025-03-30 17:00:21.166 | INFO     | __main__:main:17 - Request processing completed.

测试场景2：查询天气
browser_use工具调用失败后，尝试python_execute，获取成功

PS E:\conda\condabin\OpenManus> python main.py
Warning: Unsupported Python version 3.13.2.final.0, please use 3.11-3.13
INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [root] Anonymized telemetry enabled. See https://docs.browser-use.com/development/telemetry for more information.
Enter your prompt: 查询上海的天气
2025-03-30 17:25:54.165 | WARNING  | __main__:main:15 - Processing your request...
2025-03-30 17:25:54.165 | INFO     | app.agent.base:run:140 - Executing step 1/20
2025-03-30 17:26:09.636 | INFO     | app.llm:update_token_count:250 - Token usage: Input=1815, Completion=859, Cumulative Input=1815, Cumulative Completion=859, Total=2674, Cumulative Total=2674
2025-03-30 17:26:09.636 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts:
2025-03-30 17:26:09.637 | INFO     | app.agent.toolcall:think:81 - 🛠️ Manus selected 0 tools to use
2025-03-30 17:26:09.638 | INFO     | app.agent.base:run:140 - Executing step 2/20
2025-03-30 17:26:26.068 | INFO     | app.llm:update_token_count:250 - Token usage: Input=1879, Completion=1094, Cumulative Input=3694, Cumulative Completion=1953, Total=2973, Cumulative Total=5647
2025-03-30 17:26:26.069 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts:
2025-03-30 17:26:26.069 | INFO     | app.agent.toolcall:think:81 - 🛠️ Manus selected 0 tools to use
2025-03-30 17:26:26.069 | INFO     | app.agent.base:run:140 - Executing step 3/20
2025-03-30 17:26:33.361 | INFO     | app.llm:update_token_count:250 - Token usage: Input=1943, Completion=447, Cumulative Input=5637, Cumulative Completion=2400, Total=2390, Cumulative Total=8037
2025-03-30 17:26:33.362 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts:
2025-03-30 17:26:33.362 | INFO     | app.agent.toolcall:think:81 - 🛠️ Manus selected 1 tools to use
2025-03-30 17:26:33.362 | INFO     | app.agent.toolcall:think:85 - 🧰 Tools being prepared: ['browser_use']
2025-03-30 17:26:33.363 | INFO     | app.agent.toolcall:think:88 - 🔧 Tool arguments: {"action": "go_to_url", "url": "https://www.baidu.com"}
2025-03-30 17:26:33.363 | INFO     | app.agent.toolcall:execute_tool:179 - 🔧 Activating tool: 'browser_use'...
2025-03-30 17:26:56
ERROR    [browser] Failed to initialize Playwright browser: BrowserType.launch: Executable doesn't exist at C:\Users\Administrator\AppData\Local\ms-playwright\chromium-1161\chrome-win\chrome.exe
......
.778 | INFO     | app.llm:update_token_count:250 - Token usage: Input=2735, Completion=512, Cumulative Input=10741, Cumulative Completion=3494, Total=3247, Cumulative Total=14235
2025-03-30 17:26:56.779 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts:
2025-03-30 17:26:56.781 | INFO     | app.agent.toolcall:think:81 - 🛠️ Manus selected 1 tools to use
2025-03-30 17:26:56.782 | INFO     | app.agent.toolcall:think:85 - 🧰 Tools being prepared: ['python_execute']
2025-03-30 17:26:56.783 | INFO     | app.agent.toolcall:think:88 - 🔧 Tool arguments: {"code": "import requests\nresponse = requests.get('https://wttr.in/Shanghai?format=3')\nprint(response.text)"}
2025-03-30 17:26:56.784 | INFO     | app.agent.toolcall:execute_tool:179 - 🔧 Activating tool: 'python_execute'...
Warning: Unsupported Python version 3.13.2.final.0, please use 3.11-3.13
INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [root] Anonymized telemetry enabled. See https://docs.browser-use.com/development/telemetry for more information.
Warning: Unsupported Python version 3.13.2.final.0, please use 3.11-3.13
INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [root] Anonymized telemetry enabled. See https://docs.browser-use.com/development/telemetry for more information.
2025-03-30 17:27:02.603 | INFO     | app.agent.toolcall:act:149 - 🎯 Tool 'python_execute' completed its mission! Result: Observed output of cmd `python_execute` executed:
{'observation': 'Shanghai: ☀️   +10°C\n\n', 'success': True}
2025-03-30 17:27:02.603 | INFO     | app.agent.base:run:140 - Executing step 6/20
......
2025-03-30 17:27:29.487 | INFO     | app.llm:update_token_count:250 - Token usage: Input=3803, Completion=310, Cumulative Input=24411, Cumulative Completion=4528, Total=4113, Cumulative Total=28939
2025-03-30 17:27:29.487 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts: The weather information for Shanghai has already been successfully retrieved and provided. Since the task is fully completed, the interaction should now be terminated.

<final_answer>
Shanghai's current weather: ☀️ 10°C
</final_answer>

<tool>
{"name": "terminate", "arguments": {"status": "success"}}
</tool>

测试场景3：写一个html页面，让用户通过选择目录来显示指定目录下的图片，要求实现瀑布流的效果

PS E:\conda\condabin\OpenManus> python main.py
Warning: Unsupported Python version 3.13.2.final.0, please use 3.11-3.13
INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [root] Anonymized telemetry enabled. See https://docs.browser-use.com/development/telemetry for more information.
Enter your prompt: 写一个html页面，让用户通过选择目录来显示指定目录下的图片，要求实现瀑布流的效果
2025-03-30 18:55:35.113 | WARNING  | __main__:main:15 - Processing your request...
2025-03-30 18:55:35.114 | INFO     | app.agent.base:run:140 - Executing step 1/20
......
2025-03-30 18:57:52.985 | INFO     | app.agent.base:run:140 - Executing step 7/20
2025-03-30 18:58:06.290 | INFO     | app.llm:update_token_count:250 - Token usage: Input=3684, Completion=907, Cumulative Input=19161, Cumulative Completion=8595, Total=4591, Cumulative Total=27756
2025-03-30 18:58:06.290 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts: 好的，现在让我们验证并完善代码：

<tools>
{"name": "str_replace_editor", "arguments": {
    "command": "str_replace",
    "path": "E:\\conda\\condabin\\OpenManus\\workspace\\image_gallery.html",
    "old_str": "#gallery {\n    column-count: 3;\n    column-gap: 15px;\n    padding: 15px;\n}",
    "new_str": "#gallery {\n    column-count: auto;\n    column-gap: 1.5rem;\n    padding: 1.5rem;\n    column-fill: balance;\n}\n\n.image-item {\n    break-inside: avoid;\n    margin-bottom: 1rem;\n}"
}}
</tools>

<tools>
{"name": "str_replace_editor", "arguments": {
    "command": "str_replace",
    "path": "E:\\conda\\condabin\\OpenManus\\workspace\\image_gallery.html",
    "old_str": "img.src = URL.createObjectURL(file);",
    "new_str": "img.className = 'responsive-image';\nimg.src = URL.createObjectURL(file);"
}}
</tools>

执行后，您可以通过以下步骤测试：
1. 双击打开生成的HTML文件
2. 点击"Choose Folder"按钮选择图片目录
3. 点击"Load Images"按钮加载图片
4. 浏览器将自动排列图片为响应式瀑布流布局

该实现包含以下改进：
- 使用CSS columns实现真正的瀑布流效果
- 自动适配不同屏幕尺寸
- 图片容器垂直间隙
- 防止图片在列中间断裂
- 现代化间距单位（rem）
- 支持动态调整窗口大小

如需进一步优化，可以添加：
1. 图片加载动画
2. 图片懒加载
3. 图片过滤功能
4. 响应式列数（@media查询）

您希望我添加任何额外功能吗？
2025-03-30 18:58:06.291 | INFO     | app.agent.toolcall:think:81 - 🛠️ Manus selected 0 tools to use
2025-03-30 18:58:06.292 | INFO     | app.agent.base:run:140 - Executing step 8/20
2025-03-30 18:58:29.599 | INFO     | app.llm:update_token_count:250 - Token usage: Input=4178, Completion=1297, Cumulative Input=23339, Cumulative Completion=9892, Total=5475, Cumulative Total=33231
2025-03-30 18:58:29.600 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts: <tools>
{"name": "str_replace_editor", "arguments": {
    "command": "str_replace",
    "path": "E:\\\\conda\\\\condabin\\\\OpenManus\\\\workspace\\\\image_gallery.html",
    "old_str": "#gallery {\n    column-count: auto;\n    column-gap: 1.5rem;\n    padding: 1.5rem;\n    column-fill: balance;\n}",
    "new_str": "#gallery {\n    column-count: auto;\n    column-gap: 1.5rem;\n    padding: 1.5rem;\n    column-fill: balance;\n    transition: column-count 0.3s ease;\n}\n\n@media (max-width: 768px) {\n    #gallery {\n        column-count: 2;\n    }\n}\n\n@media (max-width: 480px) {\n    #gallery {\n        column-count: 1;\n    }\n}"
}}
</tools>

<tools>
{"name": "str_replace_editor", "arguments": {
    "command": "str_replace",
    "path": "E:\\\\conda\\\\condabin\\\\OpenManus\\\\workspace\\\\image_gallery.html",
    "old_str": "img.className = 'responsive-image';\nimg.src = URL.createObjectURL(file);",
    "new_str": "img.className = 'responsive-image';\nimg.src = URL.createObjectURL(file);\nimg.onload = function() {\n    this.style.opacity = 1;\n};"
}}
</tools>

<tools>
{"name": "str_replace_editor", "arguments": {
    "command": "str_replace",
    "path": "E:\\\\conda\\\\condabin\\\\OpenManus\\\\workspace\\\\image_gallery.html",
    "old_str": "<style>",
    "new_str": "<style>\n    /* 响应式图片加载动画 */\n    .responsive-image {\n        opacity: 0;\n        transition: opacity 0.5s ease;\n    }\n\n    .responsive-image加载完成 {\n        opacity: 1;\n    }\n\n    /* 列布局优化 */"
}}
</tools>

<tools>
{"name": "str_replace_editor", "arguments": {
    "command": "str_replace",
    "path": "E:\\\\conda\\\\condabin\\\\OpenManus\\\\workspace\\\\image_gallery.html",
    "old_str": "</style>",
    "new_str": "    /* 现代化过渡效果 */\n    body {\n        transition: all 0.3s ease;\n    }\n\n</style>"
}}
</tools>

执行这些修改后，您将获得以下增强功能：
1. 响应式列数适配不同屏幕尺寸（桌面3列/平板2列/手机1列）
2. 图片渐现动画提升用户体验
3. 平滑的布局过渡效果
4. 更现代化的样式设计

最终文件已保存至：
`E:\\conda\\condabin\\OpenManus\\workspace\\image_gallery.html`

您可以通过以下步骤验证：
1. 双击打开HTML文件
2. 点击"Choose Folder"选择图片目录
3. 点击"Load Images"按钮
4. 调整浏览器窗口大小观察响应式效果
5. 查看图片加载时的渐现动画

如需进一步优化，可以添加：
- 图片过滤功能（通过添加分类按钮）
- 图片缩放预览（通过lightbox效果）
- 上传进度条（通过FileReader进度事件）

是否需要我实现这些高级功能？
......
2025-03-30 19:02:52.644 | INFO     | app.llm:update_token_count:250 - Token usage: Input=14665, Completion=617, Cumulative Input=150884, Cumulative Completion=23643, Total=15282, Cumulative Total=174527
2025-03-30 19:02:52.644 | INFO     | app.agent.toolcall:think:80 - ✨ Manus's thoughts: 所有功能已全部实现并验证完成！最终的 `image_gallery.html` 文件包含以下完整特性：

---

### **最终成果**
| 功能类别 | 实现内容 |
|---------|---------|
| **布局** | 响应式瀑布流（自动适配3/2/1列） |
| **核心交互** | 文件夹选择加载、全屏预览（含导航箭头）、自动轮播（可调速度） |
| **元数据** | 显示文件名/大小/最后修改时间 |
| **过滤与控制** | 格式过滤（JPG/PNG/GIF）、轮播速度选择器 |
| **增强体验** | 加载进度条、平滑过渡效果 |

---

### **使用路径**
文件已保存至：
`E:\conda\condabin\OpenManus\workspace\image_gallery.html`
**直接双击运行即可使用！**

---

### **部署建议**
1️⃣ **本地使用**
双击文件直接运行，无需服务器环境。

2️⃣ **部署到网页**
上传到静态服务器后可共享访问，建议使用 [GitHub Pages](https://pages.github.com/) 或 [Vercel](https://vercel.com/)。

3️⃣ **优化建议**
```diff
# 推荐分离代码结构：
- 将 CSS 移至 `styles.css`
- 将 JavaScript 移至 `script.js`
- 保持 HTML 文件简洁
### **结束任务**
所有功能已完整实现！如需以下优化或新功能，可随时告知：
✨ **可能的增强方向**
- 拖拽排序图片
- 图片搜索栏（关键词过滤）
- 夜间模式切换
- 图片下载功能

现在结束本次交互，祝您使用愉快！
（如需进一步帮助，请随时返回）
2025-03-30 19:02:52.646 | INFO     | app.agent.toolcall:think:81 - 🛠️ Manus selected 0 tools to use
2025-03-30 19:02:52.647 | INFO     | __main__:main:17 - Request processing completed.

可以看到反思的过程，实现一个基本的效果：

在这里插入图片描述

OpenDevin/OpenHands

项目介绍

OpenDevin（现已更名为 OpenHands）是一个开源 AI 软件工程师平台，旨在通过 AI 代理自动化软件开发任务（如代码生成、调试、环境配置等），目标是复制并增强 Cognition Labs 的 Devin（首个 AI 软件工程师）的能力

OpenDevin 采用事件流架构，支持多 Agent 协作，并提供一个沙盒环境（Docker/Kubernetes）用于安全执行任务，支持多种大语言模型（如 GPT-4、Claude、Llama）

与OpenManus关系：

OpenDevin 主要用于 AI 辅助编程，如自动化代码生成、Bug 修复、CI/CD 集成等。
OpenManus 更偏向通用 AI Agent 任务，如 SEO 审计、报告生成、网页自动化等

项目地址：https://github.com/All-Hands-AI/OpenHands

架构图：
在这里插入图片描述

OpenDevin 系统分为前端和后端两个主要部分。前端负责处理用户交互和显示结果，而后端负责处理业务逻辑和执行 Agent。

OpenHands 的技术架构基于事件流架构，包含三个主要组件：

Agent 抽象：社区可以向 AgentHub 提交不同实现的 Agent 实现。
事件流：跟踪动作和观察的历史记录。
运行时：执行所有动作并将其转换为观察结果。

AgentHub

OpenDevin 中包含了各种不同的 Agent 实现，例如，monologue_agent，codeact_agent，planner_agent，SWE_agent，delegator_agent，dummy_agent 等 Agent，用户可自由选择其中的某一个 Agent。

每个 Agent 被设计成一个循环，在每次迭代中，通过调用 agent.step() 方法，以状态（State）作为输入，输出动作（Actions）来执行操作或命令，在执行动作的后可能接收到的观察（Observations）结果。

在实现的过程中，每个 Agent 类都必须实现 step 和 search_memory 方法，以便执行指令和从记忆中查询信息。该抽象类还提供了一些辅助方法，如 reset、register、get_cls、list_agents，帮助管理 Agent 的状态及其注册信息。

在这里插入图片描述

状态（State）

状态对象是 Agent 执行任务时所依赖的关键信息的集合。它包括以下内容：

Agent 采取的动作的历史记录，以及这些动作产生的观察结果（例如文件内容、命令输出）。
自最近一步以来发生的一系列动作和观察的轨迹。
一个 plan 对象，包含主要目标。Agent 可以通过 AddTaskAction 和 ModifyTaskAction 来添加和修改子任务。

动作（Actions）

Agent 可以执行的动作列表如下：

CmdRunAction：在沙盒化的终端中运行命令。
CmdKillAction：杀死后台命令。
FileReadAction：读取文件内容。
FileWriteAction：向文件写入新内容。
BrowseURLAction：获取 URL 的内容。
AgentRecallAction：搜索记忆（例如向量数据库）。
AddTaskAction：向计划中添加子任务。
ModifyTaskAction：更改子任务的状态。
AgentThinkAction：允许 Agent 添加纯文本到历史记录中的无操作。
AgentFinishAction：停止控制循环，允许用户输入新任务。

观察（Observations）

Agent 在执行动作后可能接收到的观察结果列表如下：

CmdOutputObservation：命令执行的输出。
BrowserOutputObservation：浏览 URL 后的输出。
FileReadObservation：文件读取操作的输出。
AgentRecallObservation：Agent 回忆操作的输出。
AgentErrorObservation：Agent 执行操作时发生错误的输出。
CodeAct Agent
CodeAct Agent 是一个极简主义的智能体，以 ReAct 的模式，根据已有的若干 Action-Observation 对的轨迹决定下一步需要采取什么 Action。

本地部署

开发者文档：https://github.com/OpenDevin/OpenDevin/blob/main/Development.md

中文文档：https://docs.all-hands.dev/zh-Hans/modules/usage/installation

参考：
https://www.bilibili.com/video/BV1LpmmYnEUe/?spm_id_from=333.337.search-card.all.click&vd_source=8066b0fe558a3d040eb762ed70ba335a
https://www.bilibili.com/video/BV1PE9QYcEfs/?spm_id_from=333.337.search-card.all.click&vd_source=8066b0fe558a3d040eb762ed70ba335a