All Collections

Evals

Compare models and prompt variations side by side. Run evaluations across datasets and analyze the results using aggregate metrics, keyword analysis, and record-level drill-downs to find the best configuration for your use case.