您可以将 Atlas Vector Search 与 LangChainGo 集成,以构建大型语言模型 (LLM) 应用程序并实施检索增强生成 (RAG)。本教程将演示如何开始使用带有 LangChainGo 的 Atlas Vector Search 来对数据执行语义搜索并构建 RAG 实现。具体来说,您需要执行以下操作:
设置环境。
在 Atlas 上存储自定义数据。
在您的数据上创建一个 Atlas Vector Search 索引。
运行以下向量搜索查询:
语义搜索。
带元数据预过滤的语义搜索。
使用 Atlas Vector Search 来回答有关数据的问题,从而实施RAG 。
背景
LangChainGo 是 LangChain 的 Go 编程语言实现。它是一个由社区驱动的 LangChain 框架的第三方端口。
LangChain 是一个开源框架,可通过使用“链”来简化 LLM 应用程序的创建。链是 LangChain 特有的组件,可组合用于各种AI使用案例,包括 RAG。
通过将 Atlas Vector Search 与 LangChain 集成,您可以将 Atlas 用作向量数据库,并使用 Atlas Vector Search 通过从数据中检索语义相似的文档来实现 RAG。要了解有关 RAG 的更多信息,请参阅 使用 Atlas Vector Search 进行检索增强生成 (RAG)。
LangChainGo 促进了AI应用程序的法学硕士编排,将 LangChain 的功能带入Go生态系统。它还允许开发者连接到他们首选的向量存储兼容数据库,包括MongoDB。
步骤
先决条件
如要完成本教程,您必须具备以下条件:
一个 Atlas 帐户,而其集群运行着 MongoDB 版本 6.0.11、7.0.2 或更高版本(包括 RC)。确保您的 IP 地址包含在 Atlas 项目的访问列表中。如需了解详情,请参阅创建集群。
一个 OpenAI API 密钥。您必须拥有一个 OpenAI 账号,该账号具有可用于 API 请求的信用额度。要了解有关注册 OpenAI 账号的更多信息,请参阅 OpenAI API 网站。
Voyage AI API密钥。要创建帐户和API密钥,请参阅 Voyage AI网站。
用于运行 Go 项目的终端和代码编辑器。
Go 已安装在您的设备上。
设置环境
您必须首先为本教程设置环境。请完成以下步骤以设置您的环境。
安装依赖项。
运行以下命令:
go get github.com/joho/godotenv go get github.com/tmc/langchaingo/chains go get github.com/tmc/langchaingo/llms go get github.com/tmc/langchaingo/prompts go get github.com/tmc/langchaingo/vectorstores/mongovector go get github.com/tmc/langchaingo/embeddings/voyageai go get go.mongodb.org/mongo-driver/v2/mongo go mod tidy
初始化您的环境变量。
在 langchaingo-mongodb
项目目录中,创建 .env
文件并添加以下行:
OPENAI_API_KEY="<openai-api-key>" VOYAGEAI_API_KEY="<voyage-api-key>" ATLAS_CONNECTION_STRING="<connection-string>"
将占位符值替换为您的 OpenAI API密钥、Voyage AI API密钥和Atlas 集群的 SRV连接字符串。连接字符串应使用以下格式:
mongodb+srv://<username>:<password>@<cluster-name>.mongodb.net/<dbname>
使用 Atlas 作为向量存储
在本部分中,您将定义一个异步函数,以将自定义数据加载到 Atlas 中,并将 Atlas 实例化为矢量数据库,也称为矢量存储。
导入以下依赖项。
将以下导入添加到 main.go
文件的顶部。
package main import ( "context" "log" "os" "github.com/joho/godotenv" "github.com/tmc/langchaingo/embeddings/voyageai" "github.com/tmc/langchaingo/schema" "github.com/tmc/langchaingo/vectorstores/mongovector" "go.mongodb.org/mongo-driver/v2/mongo" "go.mongodb.org/mongo-driver/v2/mongo/options" )
定义向量存储的详细信息。
以下代码执行这些操作:
通过指定以下内容将 Atlas 配置为向量存储:
通过执行以下操作来准备自定义数据:
为每个文档定义文本。
使用 LangChainGo 的
mongovector
包为文本生成嵌入。此包将文档嵌入存储在 MongoDB 中,并支持对存储的嵌入进行搜索。构建包括文本、嵌入和元数据的文档。
将构建的文档导入 Atlas 并实例化向量存储。
将以下代码粘贴到您的 main.go
文件中:
// Defines the document structure type Document struct { PageContent string `bson:"text"` Embedding []float32 `bson:"embedding"` Metadata map[string]string `bson:"metadata"` } func main() { const ( voyageAIEmbeddingDim = 1024 similarityAlgorithm = "dotProduct" indexName = "vector_index" databaseName = "langchaingo_db" collectionName = "test" ) if err := godotenv.Load(); err != nil { log.Fatal("No .env file found") } // Loads the MongoDB URI from environment uri := os.Getenv("ATLAS_CONNECTION_STRING") if uri == "" { log.Fatal("Set your 'ATLAS_CONNECTION_STRING' environment variable in the .env file") } // Loads the API key from environment voyageApiKey := os.Getenv("VOYAGEAI_API_KEY") if voyageApiKey == "" { log.Fatal("Set your VOYAGEAI_API_KEY environment variable in the .env file") } // Connects to MongoDB Atlas client, err := mongo.Connect(options.Client().ApplyURI(uri)) if err != nil { log.Fatalf("Failed to connect to server: %v", err) } defer func() { if err := client.Disconnect(context.Background()); err != nil { log.Fatalf("Error disconnecting the client: %v", err) } }() log.Println("Connected to MongoDB Atlas.") // Selects the database and collection coll := client.Database(databaseName).Collection(collectionName) // Creates an embedder client embedder, err := voyageai.NewVoyageAI( voyageai.WithModel("voyage-3-large"), ) if err != nil { log.Fatalf("Failed to create an embedder: %v", err) } // Creates a new MongoDB Atlas vector store store := mongovector.New(coll, embedder, mongovector.WithIndex(indexName), mongovector.WithPath("embeddings")) // Checks if the collection is empty, and if empty, adds documents to the MongoDB Atlas database vector store if isCollectionEmpty(coll) { documents := []schema.Document{ { PageContent: "Proper tuber planting involves site selection, proper timing, and exceptional care. Choose spots with well-drained soil and adequate sun exposure. Tubers are generally planted in spring, but depending on the plant, timing varies. Always plant with the eyes facing upward at a depth two to three times the tuber's height. Ensure 4 inch spacing between small tubers, expand to 12 inches for large ones. Adequate moisture is needed, yet do not overwater. Mulching can help preserve moisture and prevent weed growth.", Metadata: map[string]any{ "author": "A", "type": "post", }, }, { PageContent: "Successful oil painting necessitates patience, proper equipment, and technique. Begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space.", Metadata: map[string]any{ "author": "B", "type": "post", }, }, { PageContent: "For a natural lawn, selection of the right grass type suitable for your climate is crucial. Balanced watering, generally 1 to 1.5 inches per week, is important; overwatering invites disease. Opt for organic fertilizers over synthetic versions to provide necessary nutrients and improve soil structure. Regular lawn aeration helps root growth and prevents soil compaction. Practice natural pest control and consider overseeding to maintain a dense sward, which naturally combats weeds and pest.", Metadata: map[string]any{ "author": "C", "type": "post", }, }, } _, err := store.AddDocuments(context.Background(), documents) if err != nil { log.Fatalf("Error adding documents: %v", err) } log.Printf("Successfully added %d documents to the collection.\n", len(documents)) } else { log.Println("Documents already exist in the collection, skipping document addition.") } } func isCollectionEmpty(coll *mongo.Collection) bool { count, err := coll.EstimatedDocumentCount(context.Background()) if err != nil { log.Fatalf("Failed to count documents in the collection: %v", err) } return count == 0 }
运行您的Go项目。
保存文件,然后运行以下命令将数据加载到 Atlas。
go run main.go
Connected to MongoDB Atlas. Successfully added 3 documents to the collection.
提示
运行 main.go
后,您可以通过导航到集群中的 langchaingo_db.test
集合在 Atlas 用户界面中查看矢量嵌入。
创建 Atlas Vector Search 索引
注意
要创建 Atlas Vector Search 索引,您必须对 Atlas 项目具有Project Data Access Admin
或更高访问权限。
要在向量存储上启用向量搜索查询,请在langchaingo_db.test
集合上创建 Atlas Vector Search 索引。
将以下导入添加到 main.go
文件的顶部:
import ( // Other imports... "fmt" "time" "go.mongodb.org/mongo-driver/v2/bson" )
在 main.go
文件中的 main()
函数之外定义以下函数。这些函数可为MongoDB集合创建和管理向量搜索索引:
SearchIndexExists
函数检查具有指定名称的搜索索引是否存在且可查询。CreateVectorSearchIndex
函数在指定集合上创建向量搜索索引。此函数会阻塞,直到索引创建完成且可查询。
// Checks if the search index exists func SearchIndexExists(ctx context.Context, coll *mongo.Collection, idx string) (bool, error) { log.Println("Checking if search index exists.") view := coll.SearchIndexes() siOpts := options.SearchIndexes().SetName(idx).SetType("vectorSearch") cursor, err := view.List(ctx, siOpts) if err != nil { return false, fmt.Errorf("failed to list search indexes: %w", err) } for cursor.Next(ctx) { index := struct { Name string `bson:"name"` Queryable bool `bson:"queryable"` }{} if err := cursor.Decode(&index); err != nil { return false, fmt.Errorf("failed to decode search index: %w", err) } if index.Name == idx && index.Queryable { return true, nil } } if err := cursor.Err(); err != nil { return false, fmt.Errorf("cursor error: %w", err) } return false, nil } // Creates a vector search index. This function blocks until the index has been // created. func CreateVectorSearchIndex( ctx context.Context, coll *mongo.Collection, idxName string, voyageAIEmbeddingDim int, similarityAlgorithm string, ) (string, error) { type vectorField struct { Type string `bson:"type,omitempty"` Path string `bson:"path,omitempty"` NumDimensions int `bson:"numDimensions,omitempty"` Similarity string `bson:"similarity,omitempty"` } fields := []vectorField{ { Type: "vector", Path: "embeddings", NumDimensions: voyageAIEmbeddingDim, Similarity: similarityAlgorithm, }, { Type: "filter", Path: "metadata.author", }, { Type: "filter", Path: "metadata.type", }, } def := struct { Fields []vectorField `bson:"fields"` }{ Fields: fields, } log.Println("Creating vector search index...") view := coll.SearchIndexes() siOpts := options.SearchIndexes().SetName(idxName).SetType("vectorSearch") searchName, err := view.CreateOne(ctx, mongo.SearchIndexModel{Definition: def, Options: siOpts}) if err != nil { return "", fmt.Errorf("failed to create the search index: %w", err) } // Awaits the creation of the index var doc bson.Raw for doc == nil { cursor, err := view.List(ctx, options.SearchIndexes().SetName(searchName)) if err != nil { return "", fmt.Errorf("failed to list search indexes: %w", err) } if !cursor.Next(ctx) { break } name := cursor.Current.Lookup("name").StringValue() queryable := cursor.Current.Lookup("queryable").Boolean() if name == searchName && queryable { doc = cursor.Current } else { time.Sleep(5 * time.Second) } } return searchName, nil }
通过调用 main()
函数中的上述函数来创建向量存储集合和索引。将以下代码添加到 main()
函数的末尾:
// SearchIndexExists will return true if the provided index is defined for the // collection. This operation blocks until the search completes. if ok, _ := SearchIndexExists(context.Background(), coll, indexName); !ok { // Creates the vector store collection err = client.Database(databaseName).CreateCollection(context.Background(), collectionName) if err != nil { log.Fatalf("failed to create vector store collection: %v", err) } _, err = CreateVectorSearchIndex(context.Background(), coll, indexName, voyageAIEmbeddingDim, similarityAlgorithm) if err != nil { log.Fatalf("failed to create index: %v", err) } log.Println("Successfully created vector search index.") } else { log.Println("Vector search index already exists.") }
保存文件,然后运行以下命令,以创建您的 Atlas Vector Search 索引。
go run main.go
Checking if search index exists. Creating vector search index... Successfully created vector search index.
提示
运行 main.go
后,您可以通过导航到集群中的 langchaingo_db.test
集合在 Atlas 用户界面中查看向量搜索索引。
运行向量搜索查询
本节演示了您可以在矢量化数据上运行的各种查询。现在您已创建索引,可以运行向量搜索查询。
选择 Basic Semantic Search 或 Semantic Search with Filtering 标签页,查看相应的代码。
将以下代码添加到您的主函数中并保存文件。
语义搜索检索与查询在意义上相关的信息。以下代码使用 SimilaritySearch()
方法对字符串 "Prevent
weeds"
执行语义搜索,并将结果限制为第一个文档。
// Performs basic semantic search docs, err := store.SimilaritySearch(context.Background(), "Prevent weeds", 1) if err != nil { fmt.Println("Error performing search:", err) } fmt.Println("Semantic Search Results:", docs)
运行以下命令以执行查询。
go run main.go
Semantic Search Results: [{For a natural lawn, selection of the right grass type suitable for your climate is crucial. Balanced watering, generally 1 to 1.5 inches per week, is important; overwatering invites disease. Opt for organic fertilizers over synthetic versions to provide necessary nutrients and improve soil structure. Regular lawn aeration helps root growth and prevents soil compaction. Practice natural pest control and consider overseeding to maintain a dense sward, which naturally combats weeds and pest. map[author:C type:post] 0.69752026}]
您可以使用 MQL 匹配表达式来预过滤您的数据,该表达式将索引字段与集合中的另一个值进行比较。您必须将要过滤的任何元数据字段作为 filter
类型进行索引。要了解详情,请参阅如何为向量搜索建立字段索引。
将以下代码添加到您的主函数中并保存文件。
以下代码使用 SimilaritySearch()
方法对字符串 "Tulip care"
执行语义搜索。它指定以下参数:
以
1
形式返回的文件数。分数阈值为
0.60
。
它返回与过滤metadata.type:
post
匹配并包含分数阈值的文档。
// Performs semantic search with metadata filter filter := map[string]interface{}{ "metadata.type": "post", } docs, err := store.SimilaritySearch(context.Background(), "Tulip care", 1, vectorstores.WithScoreThreshold(0.60), vectorstores.WithFilters(filter)) if err != nil { fmt.Println("Error performing search:", err) } fmt.Println("Filter Search Results:", docs)
运行以下命令以执行查询。
go run main.go
Filter Search Results: [{Proper tuber planting involves site selection, proper timing, and exceptional care. Choose spots with well-drained soil and adequate sun exposure. Tubers are generally planted in spring, but depending on the plant, timing varies. Always plant with the eyes facing upward at a depth two to three times the tuber's height. Ensure 4 inch spacing between small tubers, expand to 12 inches for large ones. Adequate moisture is needed, yet do not overwater. Mulching can help preserve moisture and prevent weed growth. map[author:A type:post] 0.64432365}]
回答有关数据的问题
本部分演示使用Atlas Vector Search和 LangChainGo 的 RAG实施。现在您已经使用Atlas Vector Search检索语义相似的文档,使用以下代码示例提示法学硕士回答针对Atlas Vector Search返回的文档的问题。
将以下代码添加到主函数末尾并保存文件。
此代码执行以下操作:
将Atlas Vector Search实例化为检索器,以查询语义相似的文档。
定义 LangChainGo 提示模板,指示 LLM 使用检索到的文档作为查询的上下文。 LangChainGo 将这些文档填充到
{{.context}}
输入变量中,并将您的查询填充到{{.question}}
变量中。构建一个链,该链使用 OpenAI 的聊天模型,根据提供的提示模板生成上下文感知的回复。
使用提示符和检索器收集相关上下文,向链中发送有关初学者绘画的示例查询。
返回并打印 LLM 的响应和用作上下文的文档。
// Implements RAG to answer questions on your data optionsVector := []vectorstores.Option{ vectorstores.WithScoreThreshold(0.60), } retriever := vectorstores.ToRetriever(&store, 1, optionsVector...) // Loads OpenAI API key from environment openaiApiKey := os.Getenv("OPENAI_API_KEY") if openaiApiKey == "" { log.Fatal("Set your OPENAI_API_KEY environment variable in the .env file") } // Creates an OpenAI LLM client llm, err := openai.New(openai.WithToken(openaiApiKey), openai.WithModel("gpt-4o"), openai.WithEmbeddingModel("voyage-3-large")) if err != nil { log.Fatalf("Failed to create an LLM client: %v", err) } prompt := prompts.NewPromptTemplate( `Answer the question based on the following context: {{.context}} Question: {{.question}}`, []string{"context", "question"}, ) llmChain := chains.NewLLMChain(llm, prompt) ctx := context.Background() const question = "How do I get started painting?" documents, err := retriever.GetRelevantDocuments(ctx, question) if err != nil { log.Fatalf("Failed to retrieve documents: %v", err) } var contextBuilder strings.Builder for i, document := range documents { contextBuilder.WriteString(fmt.Sprintf("Document %d: %s\n", i+1, document.PageContent)) } contextStr := contextBuilder.String() inputs := map[string]interface{}{ "context": contextStr, "question": question, } out, err := chains.Call(ctx, llmChain, inputs) if err != nil { log.Fatalf("Failed to run LLM chain: %v", err) } log.Println("Source documents:") for i, doc := range documents { log.Printf("Document %d: %s\n", i+1, doc.PageContent) } responseText, ok := out["text"].(string) if !ok { log.Println("Unexpected response type") return } log.Println("Question:", question) log.Println("Generated Answer:", responseText)
运行以下命令以执行您的文件。
保存文件后,运行以下命令。 生成的响应可能会有所不同。
go run main.go
Source documents: Document 1: "Successful oil painting necessitates patience, proper equipment, and technique. Begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space." Question: How do I get started painting? Generated Answer: To get started painting, you should begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space.
先决条件
如要完成本教程,您必须具备以下条件:
一个 Atlas 帐户,而其集群运行着 MongoDB 版本 6.0.11、7.0.2 或更高版本(包括 RC)。确保您的 IP 地址包含在 Atlas 项目的访问列表中。如需了解详情,请参阅创建集群。
一个 OpenAI API 密钥。您必须拥有一个 OpenAI 账号,该账号具有可用于 API 请求的信用额度。要了解有关注册 OpenAI 账号的更多信息,请参阅 OpenAI API 网站。
用于运行 Go 项目的终端和代码编辑器。
Go 已安装在您的设备上。
设置环境
您必须首先为本教程设置环境。请完成以下步骤以设置您的环境。
初始化您的环境变量。
在 langchaingo-mongodb
项目目录中,创建 .env
文件并添加以下行:
OPENAI_API_KEY="<api-key>" ATLAS_CONNECTION_STRING="<connection-string>"
用 OpenAI API 密钥和 Atlas 集群的 SRV 连接字符串替换占位符值。连接字符串应使用以下格式:
mongodb+srv://<username>:<password>@<cluster-name>.mongodb.net/<dbname>
使用 Atlas 作为向量存储
在本部分中,您将定义一个异步函数,以将自定义数据加载到 Atlas 中,并将 Atlas 实例化为矢量数据库,也称为矢量存储。
导入以下依赖项。
将以下导入添加到 main.go
文件的顶部。
package main import ( "context" "log" "os" "github.com/joho/godotenv" "github.com/tmc/langchaingo/embeddings" "github.com/tmc/langchaingo/llms/openai" "github.com/tmc/langchaingo/schema" "github.com/tmc/langchaingo/vectorstores/mongovector" "go.mongodb.org/mongo-driver/v2/mongo" "go.mongodb.org/mongo-driver/v2/mongo/options" )
定义向量存储的详细信息。
以下代码执行这些操作:
通过指定以下内容将 Atlas 配置为向量存储:
通过执行以下操作来准备自定义数据:
为每个文档定义文本。
使用 LangChainGo 的
mongovector
包为文本生成嵌入。此包将文档嵌入存储在 MongoDB 中,并支持对存储的嵌入进行搜索。构建包括文本、嵌入和元数据的文档。
将构建的文档导入 Atlas 并实例化向量存储。
将以下代码粘贴到您的 main.go
文件中:
// Defines the document structure type Document struct { PageContent string `bson:"text"` Embedding []float32 `bson:"embedding"` Metadata map[string]string `bson:"metadata"` } func main() { const ( openAIEmbeddingModel = "text-embedding-3-small" openAIEmbeddingDim = 1536 similarityAlgorithm = "dotProduct" indexName = "vector_index" databaseName = "langchaingo_db" collectionName = "test" ) if err := godotenv.Load(); err != nil { log.Fatal("No .env file found") } // Loads the MongoDB URI from environment uri := os.Getenv("ATLAS_CONNECTION_STRING") if uri == "" { log.Fatal("Set your 'ATLAS_CONNECTION_STRING' environment variable in the .env file") } // Loads the API key from environment apiKey := os.Getenv("OPENAI_API_KEY") if apiKey == "" { log.Fatal("Set your OPENAI_API_KEY environment variable in the .env file") } // Connects to MongoDB Atlas client, err := mongo.Connect(options.Client().ApplyURI(uri)) if err != nil { log.Fatalf("Failed to connect to server: %v", err) } defer func() { if err := client.Disconnect(context.Background()); err != nil { log.Fatalf("Error disconnecting the client: %v", err) } }() log.Println("Connected to MongoDB Atlas.") // Selects the database and collection coll := client.Database(databaseName).Collection(collectionName) // Creates an OpenAI LLM embedder client llm, err := openai.New(openai.WithEmbeddingModel(openAIEmbeddingModel)) if err != nil { log.Fatalf("Failed to create an embedder client: %v", err) } // Creates an embedder from the embedder client embedder, err := embeddings.NewEmbedder(llm) if err != nil { log.Fatalf("Failed to create an embedder: %v", err) } // Creates a new MongoDB Atlas vector store store := mongovector.New(coll, embedder, mongovector.WithIndex(indexName), mongovector.WithPath("embeddings")) // Checks if the collection is empty, and if empty, adds documents to the MongoDB Atlas database vector store if isCollectionEmpty(coll) { documents := []schema.Document{ { PageContent: "Proper tuber planting involves site selection, proper timing, and exceptional care. Choose spots with well-drained soil and adequate sun exposure. Tubers are generally planted in spring, but depending on the plant, timing varies. Always plant with the eyes facing upward at a depth two to three times the tuber's height. Ensure 4 inch spacing between small tubers, expand to 12 inches for large ones. Adequate moisture is needed, yet do not overwater. Mulching can help preserve moisture and prevent weed growth.", Metadata: map[string]any{ "author": "A", "type": "post", }, }, { PageContent: "Successful oil painting necessitates patience, proper equipment, and technique. Begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space.", Metadata: map[string]any{ "author": "B", "type": "post", }, }, { PageContent: "For a natural lawn, selection of the right grass type suitable for your climate is crucial. Balanced watering, generally 1 to 1.5 inches per week, is important; overwatering invites disease. Opt for organic fertilizers over synthetic versions to provide necessary nutrients and improve soil structure. Regular lawn aeration helps root growth and prevents soil compaction. Practice natural pest control and consider overseeding to maintain a dense sward, which naturally combats weeds and pest.", Metadata: map[string]any{ "author": "C", "type": "post", }, }, } _, err := store.AddDocuments(context.Background(), documents) if err != nil { log.Fatalf("Error adding documents: %v", err) } log.Printf("Successfully added %d documents to the collection.\n", len(documents)) } else { log.Println("Documents already exist in the collection, skipping document addition.") } } func isCollectionEmpty(coll *mongo.Collection) bool { count, err := coll.EstimatedDocumentCount(context.Background()) if err != nil { log.Fatalf("Failed to count documents in the collection: %v", err) } return count == 0 }
运行您的Go项目。
保存文件,然后运行以下命令将数据加载到 Atlas。
go run main.go
Connected to MongoDB Atlas. Successfully added 3 documents to the collection.
提示
运行 main.go
后,您可以通过导航到集群中的 langchaingo_db.test
集合在 Atlas 用户界面中查看矢量嵌入。
创建 Atlas Vector Search 索引
注意
要创建 Atlas Vector Search 索引,您必须对 Atlas 项目具有Project Data Access Admin
或更高访问权限。
要在向量存储上启用向量搜索查询,请在langchaingo_db.test
集合上创建 Atlas Vector Search 索引。
将以下导入添加到 main.go
文件的顶部:
import ( // Other imports... "fmt" "time" "go.mongodb.org/mongo-driver/v2/bson" )
在 main.go
文件中的 main()
函数之外定义以下函数。这些函数可为MongoDB集合创建和管理向量搜索索引:
SearchIndexExists
函数检查具有指定名称的搜索索引是否存在且可查询。CreateVectorSearchIndex
函数在指定集合上创建向量搜索索引。此函数会阻塞,直到索引创建完成且可查询。
// Checks if the search index exists func SearchIndexExists(ctx context.Context, coll *mongo.Collection, idx string) (bool, error) { log.Println("Checking if search index exists.") view := coll.SearchIndexes() siOpts := options.SearchIndexes().SetName(idx).SetType("vectorSearch") cursor, err := view.List(ctx, siOpts) if err != nil { return false, fmt.Errorf("failed to list search indexes: %w", err) } for cursor.Next(ctx) { index := struct { Name string `bson:"name"` Queryable bool `bson:"queryable"` }{} if err := cursor.Decode(&index); err != nil { return false, fmt.Errorf("failed to decode search index: %w", err) } if index.Name == idx && index.Queryable { return true, nil } } if err := cursor.Err(); err != nil { return false, fmt.Errorf("cursor error: %w", err) } return false, nil } // Creates a vector search index. This function blocks until the index has been // created. func CreateVectorSearchIndex( ctx context.Context, coll *mongo.Collection, idxName string, openAIEmbeddingDim int, similarityAlgorithm string, ) (string, error) { type vectorField struct { Type string `bson:"type,omitempty"` Path string `bson:"path,omitempty"` NumDimensions int `bson:"numDimensions,omitempty"` Similarity string `bson:"similarity,omitempty"` } fields := []vectorField{ { Type: "vector", Path: "embeddings", NumDimensions: openAIEmbeddingDim, Similarity: similarityAlgorithm, }, { Type: "filter", Path: "metadata.author", }, { Type: "filter", Path: "metadata.type", }, } def := struct { Fields []vectorField `bson:"fields"` }{ Fields: fields, } log.Println("Creating vector search index...") view := coll.SearchIndexes() siOpts := options.SearchIndexes().SetName(idxName).SetType("vectorSearch") searchName, err := view.CreateOne(ctx, mongo.SearchIndexModel{Definition: def, Options: siOpts}) if err != nil { return "", fmt.Errorf("failed to create the search index: %w", err) } // Awaits the creation of the index var doc bson.Raw for doc == nil { cursor, err := view.List(ctx, options.SearchIndexes().SetName(searchName)) if err != nil { return "", fmt.Errorf("failed to list search indexes: %w", err) } if !cursor.Next(ctx) { break } name := cursor.Current.Lookup("name").StringValue() queryable := cursor.Current.Lookup("queryable").Boolean() if name == searchName && queryable { doc = cursor.Current } else { time.Sleep(5 * time.Second) } } return searchName, nil }
通过调用 main()
函数中的上述函数来创建向量存储集合和索引。将以下代码添加到 main()
函数的末尾:
// SearchIndexExists will return true if the provided index is defined for the // collection. This operation blocks until the search completes. if ok, _ := SearchIndexExists(context.Background(), coll, indexName); !ok { // Creates the vector store collection err = client.Database(databaseName).CreateCollection(context.Background(), collectionName) if err != nil { log.Fatalf("failed to create vector store collection: %v", err) } _, err = CreateVectorSearchIndex(context.Background(), coll, indexName, openAIEmbeddingDim, similarityAlgorithm) if err != nil { log.Fatalf("failed to create index: %v", err) } log.Println("Successfully created vector search index.") } else { log.Println("Vector search index already exists.") }
保存文件,然后运行以下命令,以创建您的 Atlas Vector Search 索引。
go run main.go
Checking if search index exists. Creating vector search index... Successfully created vector search index.
提示
运行 main.go
后,您可以通过导航到集群中的 langchaingo_db.test
集合在 Atlas 用户界面中查看向量搜索索引。
运行向量搜索查询
本节演示了您可以在矢量化数据上运行的各种查询。现在您已创建索引,可以运行向量搜索查询。
选择 Basic Semantic Search 或 Semantic Search with Filtering 标签页,查看相应的代码。
将以下代码添加到您的主函数中并保存文件。
语义搜索检索与查询在意义上相关的信息。以下代码使用 SimilaritySearch()
方法对字符串 "Prevent
weeds"
执行语义搜索,并将结果限制为第一个文档。
// Performs basic semantic search docs, err := store.SimilaritySearch(context.Background(), "Prevent weeds", 1) if err != nil { fmt.Println("Error performing search:", err) } fmt.Println("Semantic Search Results:", docs)
运行以下命令以执行查询。
go run main.go
Semantic Search Results: [{For a natural lawn, selection of the right grass type suitable for your climate is crucial. Balanced watering, generally 1 to 1.5 inches per week, is important; overwatering invites disease. Opt for organic fertilizers over synthetic versions to provide necessary nutrients and improve soil structure. Regular lawn aeration helps root growth and prevents soil compaction. Practice natural pest control and consider overseeding to maintain a dense sward, which naturally combats weeds and pest. map[author:C type:post] 0.69752026}]
您可以使用 MQL 匹配表达式来预过滤您的数据,该表达式将索引字段与集合中的另一个值进行比较。您必须将要过滤的任何元数据字段作为 filter
类型进行索引。要了解详情,请参阅如何为向量搜索建立字段索引。
将以下代码添加到您的主函数中并保存文件。
以下代码使用 SimilaritySearch()
方法对字符串 "Tulip care"
执行语义搜索。它指定以下参数:
以
1
形式返回的文件数。分数阈值为
0.60
。
它返回与过滤metadata.type:
post
匹配并包含分数阈值的文档。
// Performs semantic search with metadata filter filter := map[string]interface{}{ "metadata.type": "post", } docs, err := store.SimilaritySearch(context.Background(), "Tulip care", 1, vectorstores.WithScoreThreshold(0.60), vectorstores.WithFilters(filter)) if err != nil { fmt.Println("Error performing search:", err) } fmt.Println("Filter Search Results:", docs)
运行以下命令以执行查询。
go run main.go
Filter Search Results: [{Proper tuber planting involves site selection, proper timing, and exceptional care. Choose spots with well-drained soil and adequate sun exposure. Tubers are generally planted in spring, but depending on the plant, timing varies. Always plant with the eyes facing upward at a depth two to three times the tuber's height. Ensure 4 inch spacing between small tubers, expand to 12 inches for large ones. Adequate moisture is needed, yet do not overwater. Mulching can help preserve moisture and prevent weed growth. map[author:A type:post] 0.64432365}]
回答有关数据的问题
本部分演示使用Atlas Vector Search和 LangChainGo 的 RAG实施。现在您已经使用Atlas Vector Search检索语义相似的文档,使用以下代码示例提示法学硕士回答针对Atlas Vector Search返回的文档的问题。
将以下代码添加到主函数末尾并保存文件。
此代码执行以下操作:
将Atlas Vector Search实例化为检索器,以查询语义相似的文档。
定义 LangChainGo 提示模板,指示 LLM 使用检索到的文档作为查询的上下文。 LangChainGo 将这些文档填充到
{{.context}}
输入变量中,并将您的查询填充到{{.question}}
变量中。构建一个链,该链使用 OpenAI 的聊天模型,根据提供的提示模板生成上下文感知的回复。
使用提示符和检索器收集相关上下文,向链中发送有关初学者绘画的示例查询。
返回并打印 LLM 的响应和用作上下文的文档。
// Implements RAG to answer questions on your data optionsVector := []vectorstores.Option{ vectorstores.WithScoreThreshold(0.60), } retriever := vectorstores.ToRetriever(&store, 1, optionsVector...) prompt := prompts.NewPromptTemplate( `Answer the question based on the following context: {{.context}} Question: {{.question}}`, []string{"context", "question"}, ) llmChain := chains.NewLLMChain(llm, prompt) ctx := context.Background() const question = "How do I get started painting?" documents, err := retriever.GetRelevantDocuments(ctx, question) if err != nil { log.Fatalf("Failed to retrieve documents: %v", err) } var contextBuilder strings.Builder for i, document := range documents { contextBuilder.WriteString(fmt.Sprintf("Document %d: %s\n", i+1, document.PageContent)) } contextStr := contextBuilder.String() inputs := map[string]interface{}{ "context": contextStr, "question": question, } out, err := chains.Call(ctx, llmChain, inputs) if err != nil { log.Fatalf("Failed to run LLM chain: %v", err) } log.Println("Source documents:") for i, doc := range documents { log.Printf("Document %d: %s\n", i+1, doc.PageContent) } responseText, ok := out["text"].(string) if !ok { log.Println("Unexpected response type") return } log.Println("Question:", question) log.Println("Generated Answer:", responseText)
运行以下命令以执行您的文件。
保存文件后,运行以下命令。 生成的响应可能会有所不同。
go run main.go
Source documents: Document 1: "Successful oil painting necessitates patience, proper equipment, and technique. Begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space." Question: How do I get started painting? Generated Answer: To get started painting, you should begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space.
完成本教程后,您已成功将Atlas Vector Search与 LangChainGo 集成以构建RAG应用程序。您已完成以下操作:
启动并配置了必要的环境来支持您的应用程序
将自定义数据存储在 Atlas 中,并将 Atlas 实例化为向量存储
在您的数据上构建 Atlas Vector Search 索引,实现语义搜索功能。
使用向量嵌入来检索语义相关的数据
通过结合元数据过滤器增强搜索结果
使用Atlas Vector Search实施 RAG 工作流程,根据您的数据为问题提供有意义的答案
后续步骤
要学习;了解有关开始使用Atlas Vector Search 的更多信息,请参阅Atlas Vector Search快速入门,然后从下拉菜单中选择 Go。
要了解有关向量嵌入的更多信息,请参阅如何创建向量嵌入,然后从下拉菜单中选择 Go。
要学习;了解如何集成 LangChainGo 和 Huging Face,请参阅使用Atlas Vector Search进行检索增强生成 (RAG)。
要了解如何在不需要 API 密钥或积分的情况下实现 RAG,请参阅使用 Atlas Vector Search 构建本地 RAG 实现。
MongoDB 还提供以下开发者资源:
提示
如要了解有关集成 LangChainGo、OpenAI 和 MongoDB 的更多信息,请参阅使用 MongoDB Atlas 作为带有 OpenAI 嵌入的向量存储。