Skip to content

feat(loops): add truss loops runs metrics command#2495

Open
rcano-baseten wants to merge 2 commits into
mainfrom
rcano/loops-runs-metrics
Open

feat(loops): add truss loops runs metrics command#2495
rcano-baseten wants to merge 2 commits into
mainfrom
rcano/loops-runs-metrics

Conversation

@rcano-baseten

Copy link
Copy Markdown
Contributor

Summary

  • Adds truss loops runs metrics --run-id <id> showing request volume + concurrent requests for the run's trainer deployment and its paired sampler, over a configurable window
  • Default behavior: one snapshot for run.created_at → now. --tail keeps refreshing until interrupted (live cli-table or NDJSON in json mode)
  • Window flags: --since 30m|2h|1d or --start/--end (ISO-8601, mutually exclusive with --since)
  • Output flags: -o cli-table|json (matches existing loops view convention) and --output-file PATH for json

Test plan

  • Unit tests cover snapshot table, snapshot json, json + --output-file, --since/--start mutex, --output-file requires json, invalid duration, no matching active deployment, inactive deployments ignored during resolution, run without sampler, and --since overrides the default window
  • uv run pytest truss/tests/cli/test_loops_cli.py → 56 passed
  • uv run ruff check + uv run ruff format clean
  • Smoke against baseten-local once a Loops run is up — not yet run

🤖 Generated with Claude Code

Adds a metrics command scoped to a Loops run that reports request volume and
concurrent requests for both the trainer deployment and its paired sampler
within a configurable time window.

- Defaults to a single snapshot for the window `run.created_at → now`.
- `--tail` enables live updates (cli-table refreshes in place; json mode emits
  NDJSON, one document per refresh tick).
- `--since` (duration like '30m'/'2h'/'1d') or `--start`/`--end` (ISO-8601)
  override the default window.
- `-o cli-table|json` matches the existing `loops view` convention;
  `--output-file PATH` redirects JSON to a file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread truss/cli/loops_commands.py Outdated
Comment on lines +224 to +228
_DEFAULT_METRICS_REFRESH_SECONDS = 30
_SPARKLINE_BLOCKS = "▁▂▃▄▅▆▇█"
_SPARKLINE_WIDTH = 24
_DURATION_RE = re.compile(r"^\s*(\d+)\s*([smhd])\s*$", re.IGNORECASE)
_DURATION_UNIT_SECONDS = {"s": 1, "m": 60, "h": 3600, "d": 86400}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create another file that's focused on showing the metrics display?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6336694 — moved the cli-table layout, NDJSON emission, and tail loop into truss/cli/loops_run_metrics_viewer.py (mirroring loops_checkpoint_viewer.py). The command body in loops_commands.py is now wiring + dispatch.

Extracts the cli-table layout, NDJSON emission, and live-tail loop into
loops_run_metrics_viewer.py, mirroring the existing loops_checkpoint_viewer
pattern. The command body in loops_commands.py now reads as wiring +
dispatch (parse args → resolve trainer deployment → build fetch closure →
delegate to render_metrics_snapshot / tail_metrics_table / emit_json_snapshots).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant