『Agent框架』构建生产级Agent的12因素

最新推荐文章于 2025-08-28 15:36:39 发布

原创最新推荐文章于 2025-08-28 15:36:39 发布 · 1.1k 阅读

24 ·

CC 4.0 BY-SA版权

文章标签：

#python #人工智能 #agent

LLM 同时被 2 个专栏收录

17 篇文章

订阅专栏

Agent

7 篇文章

订阅专栏

前言
0、介绍
一、自然语言到工具调用
二、拥有你自己的提示词
三、拥有你自己的上下文窗口
四、工具仅仅是结构化输出
五、统一执行状态和业务状态
六、使用API去暂停/启动/恢复
七、使用工具调用与人类联系
八、拥有你自己的控制流
九、上下文窗口中的错误
十、小型并且专注的智能体
十一、从任何地方触发、满足用户需求
十二、让你的Agent成为无状态
总结

前言

《12-Factor Apps》通过十二个原则解决了传统软件在可扩展性、可维护性和可部署性方面的核心挑战。如今，AI应用同样面临着从原型到生产的巨大鸿沟，我们迫切需要一套类似的方法论来指导LLM应用的工程实践。

0、介绍

Agent是什么？ Agent就是“目标+工具+LLM决策”，其中LLM决策调用什么工具直到完成目标。
概念： 《12-Factor Apps》通过十二个原则解决了传统软件在可扩展性、可维护性和可部署性方面的核心挑战。如今，AI应用同样面临着从原型到生产的巨大鸿沟，我们迫切需要一套类似的方法论来指导LLM应用的工程实践。
软件发展史： 回想一下软件发展史：最早我们用流程图表示程序逻辑，后来有了Airflow这样的DAG编排器，现在Agent承诺我们可以"扔掉DAG"，让AI实时决策。听起来很美好，但实际效果却不尽如人意。问题在哪？AI的决策虽然灵活，但缺乏足够的可预测性和可控性，这对生产环境来说是致命的。
官方Github示意图如下所示： 12-factor-agents

在这里插入图片描述

一、自然语言到工具调用

概念： Agent构建中最常见的模式之一是将自然语言转换为结构化的工具调用。请添加图片描述
自然语言： can you create a payment link for $750 to Terri for sponsoring the february AI tinkerers meetup?

转化后的API为：

{
  "function": {
    "name": "create_payment_link",
    "parameters": {
      "amount": 750,
      "customer": "cust_128934ddasf9",
      "product": "prod_8675309",
      "price": "prc_09874329fds",
      "quantity": 1,
      "memo": "Hey Jeff - see below for the payment link for the february ai tinkerers meetup"
    }
  }
}

确定性代码的处理逻辑：

# The LLM takes natural language and returns a structured object
# LLM将自然语言转化为结构化对象
nextStep = await llm.determineNextStep(
  """
  create a payment link for $750 to Jeff 
  for sponsoring the february AI tinkerers meetup
  """
  )

# Handle the structured output based on its function
# 基于结构化输出去处理不同的功能调用
if nextStep.function == 'create_payment_link':
    stripe.paymentlinks.create(nextStep.parameters)
    return  # or whatever you want, see below
elif nextStep.function == 'something_else':
    # ... more cases
    pass
else:  # the model didn't call a tool we know about
    # do something else
    pass

二、拥有你自己的提示词

Notice： 不要把自己的提示词外包给框架，不要依赖框架提示的默认提示词模板。

请添加图片描述
框架的默认黑盒方法：

agent = Agent(
  role="...",
  goal="...",
  personality="...",
  tools=[tool1, tool2, tool3]
)

task = Task(
  instructions="...",
  expected_output=OutputModel
)

result = agent.run(task)

书写自己的提示词模板，并且把他们当作核心代码去管理。（以下模板使用BAML生成提示词）

function DetermineNextStep(thread: string) -> DoneForNow | ListGitTags | DeployBackend | DeployFrontend | RequestMoreInformation {
  prompt #"
    {{ _.role("system") }}
    You are a helpful assistant that manages deployments for frontend and backend systems.
    You work diligently to ensure safe and successful deployments by following best practices
    and proper deployment procedures.
    
    Before deploying any system, you should check:
    - The deployment environment (staging vs production)
    - The correct tag/version to deploy
    - The current system status
    
    You can use tools like deploy_backend, deploy_frontend, and check_deployment_status
    to manage deployments. For sensitive deployments, use request_approval to get
    human verification.
    
    Always think about what to do first, like:
    - Check current deployment status
    - Verify the deployment tag exists
    - Request approval if needed
    - Deploy to staging before production
    - Monitor deployment progress
    
    {{ _.role("user") }}

    {{ thread }}
    
    What should the next step be?
  "#
}

使用工具或者自己手写提示词的好处在于：

完全控制：编写您的Agent确实需要的指令，而不是黑盒抽象
测试与评估：像对待其他代码一样为提示词构建测试与评估
快速迭代：根据实际性能快速修改提示词
透明度：确切知道您的Agent在使用什么指令
角色技巧：利用支持非标准用户/助手角色使用的API

提示词应该被视为最重要代码资源，需要完整的软件工程流程：版本控制、分支管理、代码审查、单元测试和性能监控。记住：您的提示词是应用逻辑和LLM之间的主要接口，拥有对提示词的完全控制能为您提供构建生产级Agent所需的灵活性和精确控制。

三、拥有你自己的上下文窗口

创建一个好的上下文窗口意味着：

给LLM好的提示&指令
检索到的任何文档或者是外部数据
任何过去的状态、工具的调用，结果以及其他历史。
任何过去的消息或事件（记忆）
结构化输出的指令

一个好的上下文窗口，如下图所示：
请添加图片描述
标准上下文格式如下所示：

[
  {
    "role": "system",
    "content": "You are a helpful assistant..."
  },
  {
    "role": "user",
    "content": "Can you deploy the backend?"
  },
  {
    "role": "assistant",
    "content": null,
    "tool_calls": [
      {
        "id": "1",
        "name": "list_git_tags",
        "arguments": "{}"
      }
    ]
  },
  {
    "role": "tool",
    "name": "list_git_tags",
    "content": "{\"tags\": [{\"name\": \"v1.2.3\", \"commit\": \"abc123\", \"date\": \"2024-03-15T10:00:00Z\"}, {\"name\": \"v1.2.2\", \"commit\": \"def456\", \"date\": \"2024-03-14T15:30:00Z\"}, {\"name\": \"v1.2.1\", \"commit\": \"abe033d\", \"date\": \"2024-03-13T09:15:00Z\"}]}",
    "tool_call_id": "1"
  }
]

将Content Window 放入单个用户消息的示例：

[
  {
    "role": "system",
    "content": "You are a helpful assistant..."
  },
  {
    "role": "user",
    "content": |
            Here's everything that happened so far:
        
        <slack_message>
            From: @alex
            Channel: #deployments
            Text: Can you deploy the backend?
        </slack_message>
        
        <list_git_tags>
            intent: "list_git_tags"
        </list_git_tags>
        
        <list_git_tags_result>
            tags:
              - name: "v1.2.3"
                commit: "abc123"
                date: "2024-03-15T10:00:00Z"
              - name: "v1.2.2"
                commit: "def456"
                date: "2024-03-14T15:30:00Z"
              - name: "v1.2.1"
                commit: "ghi789"
                date: "2024-03-13T09:15:00Z"
        </list_git_tags_result>
        
        what's the next step?
    }
]

如何构建？

class Thread:
  events: List[Event]

class Event:
  # could just use string, or could be explicit - up to you
  type: Literal["list_git_tags", "deploy_backend", "deploy_frontend", "request_more_information", "done_for_now", "list_git_tags_result", "deploy_backend_result", "deploy_frontend_result", "request_more_information_result", "done_for_now_result", "error"]
  data: ListGitTags | DeployBackend | DeployFrontend | RequestMoreInformation |  
        ListGitTagsResult | DeployBackendResult | DeployFrontendResult | RequestMoreInformationResult | string

def event_to_prompt(event: Event) -> str:
    data = event.data if isinstance(event.data, str) \
           else stringifyToYaml(event.data)

    return f"<{event.type}>\n{data}\n</{event.type}>"


def thread_to_prompt(thread: Thread) -> str:
  return '\n\n'.join(event_to_prompt(event) for event in thread.events)

关键好处：

信息密度：以最大化LLM理解的方式构建信息
错误处理：以帮助LLM恢复的格式包含错误信息
安全性：控制传递给LLM的信息，过滤敏感数据
灵活性：随着您了解最佳实践而调整格式
Token效率：优化上下文格式以提高token效率和LLM理解

四、工具仅仅是结构化输出

工具的本质：只是来自LLM的结构化输出，可以触发确定性代码。

请添加图片描述
工具调用的数据结构定义：

class Issue:
  title: str
  description: str
  team_id: str
  assignee_id: str

class CreateIssue:
  intent: "create_issue"
  issue: Issue

class SearchIssues:
  intent: "search_issues"
  query: str
  what_youre_looking_for: str

模式很简单:

LLM 输出结构化 JSON
确定性代码执行适当的操作(如调用外部API)
结果被捕获并反馈到上下文中

处理逻辑示例：

if nextStep.intent == 'create_payment_link':
    stripe.paymentlinks.create(nextStep.parameters)
    return # or whatever you want, see below
elif nextStep.intent == 'wait_for_a_while': 
    # do something monadic idk
else: #... the model didn't call a tool we know about
    # do something else

五、统一执行状态和业务状态

请添加图片描述
传统系统往往分离执行状态（当前步骤、等待状态、重试次数等）和业务状态（用户数据、处理历史等），这种分离会增加系统复杂度并带来一致性问题。

执行状态 :当前步骤、下一步、等待状态、重试计数等。
业务状态 :到目前为止,代理工作流中发生了什么(例如 OpenAI 消息列表、工具调用和结果列表等)

统一状态管理的好处：

简单性：所有状态的单一真实来源
序列化：线程可以轻松序列化/反序列化
调试：整个历史在一个地方可见
灵活性：通过添加新事件类型轻松添加新状态
恢复：通过加载线程从任何点恢复
分支：通过将线程的某个子集复制到新的上下文/状态ID中，可以在任何点分支线程
人类界面和可观测性：将线程转换为人类可读的markdown或丰富的Web应用UI变得轻松

六、使用API去暂停/启动/恢复

请添加图片描述
Agent就像普通程序一样，我们对如何启动、查询、恢复和停止它们有特定的期望。

启动：通过简单API轻松启动Agent
暂停：在长时间运行时可以暂停。
外部恢复：外部触发器可以使得Agent从中断的地方恢复。

七、使用工具调用与人类联系

请添加图片描述

八、拥有你自己的控制流

请添加图片描述
拥有控制流，可以做很多有趣的事情：

工具调用结果的摘要或者是缓存
结构化输出的LLM-as-judge的评估
上下文窗口的压缩或者是其他内存管理
日志的记录、跟踪和指标
客户端速率限制

三种控制流模式：

request_clarification: 模型要求更多信息, 打破循环,等待人类的响应
fetch_git_tags: 模型要求列出 git 标签,获取标签,追加到上下文窗口,然后直接传递回模型
deploy_backend: 模型要求部署后端,这是一个高风险的事情,所以打破循环,等待人类批准

def handle_next_step(thread: Thread):

  while True:
    next_step = await determine_next_step(thread_to_prompt(thread))
    
    # inlined for clarity - in reality you could put 
    # this in a method, use exceptions for control flow, or whatever you want
    if next_step.intent == 'request_clarification':
      thread.events.append({
        type: 'request_clarification',
          data: nextStep,
        })

      await send_message_to_human(next_step)
      await db.save_thread(thread)
      # async step - break the loop, we'll get a webhook later
      break
    elif next_step.intent == 'fetch_open_issues':
      thread.events.append({
        type: 'fetch_open_issues',
        data: next_step,
      })

      issues = await linear_client.issues()

      thread.events.append({
        type: 'fetch_open_issues_result',
        data: issues,
      })
      # sync step - pass the new context to the LLM to determine the NEXT next step
      continue
    elif next_step.intent == 'create_issue':
      thread.events.append({
        type: 'create_issue',
        data: next_step,
      })

      await request_human_approval(next_step)
      await db.save_thread(thread)
      # async step - break the loop, we'll get a webhook later
      break

如果没有这种级别的可重合性/粒度,就无法在工具运行之前审查/批准工具调用,这意味着你要么被迫:

暂停内存中的任务,等待长期运行的事情完成(想想while…sleep) 并重新启动它,如果进程中断
将Agent限制为低风险,低风险呼叫,如研究和总结。

九、上下文窗口中的错误

请添加图片描述
Agent的最大优势之一是“自愈”，即错误信息被有效的传递给Agent，并且Agent可以识别错误信息并且进行恢复：

难点：

错误压缩：将详细的错误信息压缩成简洁但是有用的格式。
错误分类：区分可恢复错误和致命错误。
错误计数：防止Agent陷入错误循环。
错误清理：在错误解决后从上下文中移除。

如果捕获到错误，就将其添加到上下文并且再次尝试：

thread = {"events": [initial_message]}

while True:
  next_step = await determine_next_step(thread_to_prompt(thread))
  thread["events"].append({
    "type": next_step.intent,
    "data": next_step,
  })
  try:
    result = await handle_next_step(thread, next_step) # our switch statement
  except Exception as e:
    # if we get an error, we can add it to the context window and try again
    thread["events"].append({
      "type": 'error',
      "data": format_error(e),
    })
    # loop, or do whatever else here to try to recover

为特定工具调用实现错误计数器

consecutive_errors = 0

while True:

  # ... existing code ...

  try:
    result = await handle_next_step(thread, next_step)
    thread["events"].append({
      "type": next_step.intent + '_result',
      data: result,
    })
    # success! reset the error counter
    consecutive_errors = 0
  except Exception as e:
    consecutive_errors += 1
    if consecutive_errors < 3:
      # do the loop and try again
      thread["events"].append({
        "type": 'error',
        "data": format_error(e),
      })
    else:
      # break the loop, reset parts of the context window, escalate to a human, or whatever else you want to do
      break
  }
}

十、小型并且专注的智能体

请添加图片描述
LLM的自身限制：任务越大,越复杂,它将采取的步骤越多,这意味着更长的上下文窗口。随着上下文的增长,LLM更有可能失去焦点。

优点：

可管理的上下文：较小的上下文窗口意味着更好的LLM性能。
明确职责：每个代理都有一个明确界定的范围和目的
更好的可靠性：在复杂的工作流中失去焦点的可能性更低。
更轻松的调试：更简单的测试

十一、从任何地方触发、满足用户需求

请添加图片描述
AI应该能够适应这种多样性。实现从任何渠道触发Agent的能力：API调用、定时任务、Webhook、钉钉消息、飞书、短信等。

十二、让你的Agent成为无状态

请添加图片描述
无状态设计：

Agent设计为无状态的函数式处理器：接收输入（当前状态），产生输出（下一步动作），不保存内部状态。
所有状态都应该存储在外部的持久化系统中，Agent本身只是一个纯函数的转换器。
这种设计带来多重优势：更容易测试（纯函数易于验证）、更容易扩展（无状态组件可以任意并行）、更容易部署（无需考虑状态迁移）、更容易恢复（可以从任何保存的状态重新开始）。

参考文献：

12-factor-agents
GitHub上5.4k+Star爆火，构建生产级Agent 的12因素