Loop
Loop is an AI assistant in Braintrust playgrounds. It helps you optimize and generate prompts, datasets and evals.
Loop is in public beta and is off by default. To turn it on, flip the feature flag in your settings. If you are on a hybrid deployment, Loop is available starting with v0.0.74
.
Selecting a model
Loop uses the AI models available in your Braintrust account via the proxy. It will attempt to use Claude 4 Sonnet by default, but supports any model that you have configured in your AI providers, including custom models.
To choose a model, navigate to the gear icon in the Loop chat window and select from the list of available models.
Available tools
Loop currently offers the following functionalities:
- Summarize playground: generate a summary of current playground contents
- Get eval results: retrieve evaluation results directly within Loop
- Edit prompt: generate and modify prompts
- Run eval: execute evaluations directly within Loop
- Edit data: generate and modify datasets
- Continue execution: resume interrupted or paused tasks
- Edit scorers: select existing scorers or write new ones
Before suggesting any optimizations, the agent will run and/or summarize your playground to investigate what improvements to suggest. You can remove any of these tools from your Loop workflow by selecting the gear icon and deselecting a tool from the available list.
Coming soon
- Fetch logs: access and review logs directly within Loop
- Create prompt: create a new prompt
- More UI integration: the ability to access Loop outside of playgrounds
Generating and optimizing prompts
Loop can help you generate a prompt from scratch. To do so, make sure you have an empty task open, then use Loop to generate a prompt.
If you have existing prompts, you can optimize them using Loop.
To optimize a prompt, ask Loop in the chat window, or select the Loop icon in the top bar of any existing task. From there, you can add the prompt to your chat, or quick optimize.
After Loop provides a suggested optimization, you can review and accept the suggestion or keep iterating.
Generating and optimizing datasets
If no dataset exists, Loop can create one automatically. You must have a task in order for Loop to generate a tailored dataset for the evaluation task.
You can review the dataset and further refine it as needed.
After you run your playground, you can also ask Loop to optimize your dataset. The agent will provide various areas for optimizations based on an analysis of your current dataset.
Generating and editing scorers
If no scorers exist, Loop can create one for you. You must have a dataset and a task in order for Loop to generate a scorer that is specific to your use case. The agent will begin by checking what data you have, what existing scorers are available, and fetching some sample results to understand the data structure.
If you select Accept, the new scorer will be added to the playground.
Loop can also help you improve and edit existing scorers.
Run and assess evals
After your tasks, dataset, and scorers are set up, Loop can run an evaluation for you, analyze it, and suggest further improvements.
Mode
By default, Loop will ask you for confirmation before executing certain tool calls, like running an evaluation. If you'd like Loop to run evaluations without confirmation, you can turn off this setting in the agent mode menu.
Continuous agent
In continuous agent mode, Loop will execute tools and make edit suggestions one after the other.