Home

Jump to bottom

David Paluy edited this page May 4, 2026 · 5 revisions

RubricLLM Wiki

Ruby port of DeepEval.

Understanding A/B Comparison — why paired t-tests with p-values are the right tool for comparing two models, and how to read the significance markers in RubricLLM.compare output.
Why Retrieval Metrics Are Pure Math — what precision_at_k, recall_at_k, mrr, ndcg, and hit_rate actually compute, and why they're free of judge bias.