ray.serve.llm.LLMServer.embeddings#
- async LLMServer.embeddings(request: EmbeddingRequest) AsyncGenerator[List[ErrorResponse] | EmbeddingResponse, None] [source]#
Runs an embeddings request to the engine and returns the response.
Returns an AsyncGenerator over the EmbeddingResponse object. This is so that the caller can have a consistent interface across all the methods of chat, completions, and embeddings.
- Parameters:
request – An EmbeddingRequest object.
- Returns:
An AsyncGenerator over the EmbeddingResponse object.