在机器学习项目中,跟踪和优化模型的表现是非常重要的一环。Comet平台提供了一个集成和管理模型的强大工具,使得在训练、评估到生产监控的整个过程中都能有效管理。本篇文章将展示如何通过Comet追踪LangChain实验、评估指标和LLM会话。
技术背景介绍
Comet是一款机器学习平台,旨在帮助用户管理、可视化和优化他们的机器学习模型。LangChain是一个专为语言模型设计的工具包,支持大语言模型的各种应用场景。结合这两个工具,可以更好地分析和提升模型性能。
核心原理解析
Comet通过回调机制记录模型的运行状态、评估指标和生成结果。这些数据能帮助开发者在训练和推理过程中做出及时的调整,确保系统始终运行在最佳状态。
代码实现演示
安装Comet及其依赖
# 安装必要的Python库
%pip install --upgrade --quiet comet_ml langchain langchain-openai google-search-results spacy textstat pandas
!{sys.executable} -m spacy download en_core_web_sm
初始化Comet并设置凭证
首先初始化Comet并获取API Key。
import comet_ml
# 初始化Comet项目用于实验追踪
comet_ml.init(project_name="comet-example-langchain")
# 设置API凭证
import os
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
os.environ["SERPAPI_API_KEY"] = "your-serpapi-api-key"
场景1: 使用LLM
使用CometCallbackHandler追踪一个简单的LLM调用。
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_openai import OpenAI
# 配置回调以追踪模型运行
comet_callback = CometCallbackHandler(
project_name="comet-example-langchain",
complexity_metrics=True,
stream_logs=True,
tags=["llm"],
visualizations=["dep"],
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks, verbose=True)
# 生成结果并打印
llm_result = llm.generate(["Tell me a joke", "Tell me a poem", "Tell me a fact"] * 3)
print("LLM result", llm_result)
comet_callback.flush_tracker(llm, finish=True)
场景2: 使用链条
在链条中使用LLM,并定义一个简短的故事大纲生成任务。
from langchain.chains import LLMChain
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
comet_callback = CometCallbackHandler(
complexity_metrics=True,
project_name="comet-example-langchain",
stream_logs=True,
tags=["synopsis-chain"],
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks)
template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: {title}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=callbacks)
test_prompts = [{"title": "Documentary about Bigfoot in Paris"}]
print(synopsis_chain.apply(test_prompts))
comet_callback.flush_tracker(synopsis_chain, finish=True)
场景3: 使用工具代理
结合工具和代理,执行复杂任务并追踪结果。
from langchain.agents import initialize_agent, load_tools
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_openai import OpenAI
comet_callback = CometCallbackHandler(
project_name="comet-example-langchain",
complexity_metrics=True,
stream_logs=True,
tags=["agent"],
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks)
tools = load_tools(["serpapi", "llm-math"], llm=llm, callbacks=callbacks)
agent = initialize_agent(
tools,
llm,
agent="zero-shot-react-description",
callbacks=callbacks,
verbose=True,
)
agent.run(
"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?"
)
comet_callback.flush_tracker(agent, finish=True)
场景4: 使用自定义评估指标
定义并应用自定义评估指标(如ROUGE分数),测量生成文本的质量。
from langchain.chains import LLMChain
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
from rouge_score import rouge_scorer
class Rouge:
def __init__(self, reference):
self.reference = reference
self.scorer = rouge_scorer.RougeScorer(["rougeLsum"], use_stemmer=True)
def compute_metric(self, generation, prompt_idx, gen_idx):
prediction = generation.text
results = self.scorer.score(target=self.reference, prediction=prediction)
return {
"rougeLsum_score": results["rougeLsum"].fmeasure,
"reference": self.reference,
}
reference = """
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building.
It was the first structure to reach a height of 300 metres.
It is now taller than the Chrysler Building in New York City by 5.2 metres (17 ft)
Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France.
"""
rouge_score = Rouge(reference=reference)
template = """Given the following article, it is your job to write a summary.
Article:
{article}
Summary: This is the summary for the above article:"""
prompt_template = PromptTemplate(input_variables=["article"], template=template)
comet_callback = CometCallbackHandler(
project_name="comet-example-langchain",
complexity_metrics=False,
stream_logs=True,
tags=["custom_metrics"],
custom_metrics=rouge_score.compute_metric,
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template)
test_prompts = [
{
"article": """
The tower is 324 metres (1,063 ft) tall, about the same height as
an 81-storey building, and the tallest structure in Paris. Its base is square,
measuring 125 metres (410 ft) on each side.
During its construction, the Eiffel Tower surpassed the
Washington Monument to become the tallest man-made structure in the world,
a title it held for 41 years until the Chrysler Building
in New York City was finished in 1930.
It was the first structure to reach a height of 300 metres.
Due to the addition of a broadcasting aerial at the top of the tower in 1957,
it is now taller than the Chrysler Building by 5.2 metres (17 ft).
Excluding transmitters, the Eiffel Tower is the second tallest
free-standing structure in France after the Millau Viaduct.
"""
}
]
print(synopsis_chain.apply(test_prompts, callbacks=callbacks))
comet_callback.flush_tracker(synopsis_chain, finish=True)
应用场景分析
通过Comet,我们能够详细记录和分析模型的各项表现指标,无论是在构建语言模型的过程中还是在后期评估上,这种能力非常重要。
实践建议
- 利用Comet的可视化功能来识别模型在不同阶段的表现瓶颈。
- 定期评估并更新用于实验的回调和评估指标,确保它们能反映最新的数据和需求变化。
- 通过自定义评估指标,更好地适应具体应用场景的需求,提升模型的实际效果。
如果遇到问题欢迎在评论区交流。
—END—