EvalRunner
EvalRunner(config=None)
.add_provider(provider_name, model, *, label=None, api_key=None, base_url=None)
Add a model to evaluate. Returns self for chaining.
.add_provider_instance(label, provider)
Add a pre-configured Provider instance.
.add_metric(metric)
Add a scoring metric. Returns self for chaining.
.run(dataset, show_progress=True)
Run the eval. Returns an EvalResults object.
RunConfig
| Parameter | Default | Description |
|---|---|---|
temperature | 0.0 | Sampling temperature for completions. |
max_tokens | 1024 | Max tokens per completion. |
max_workers | 10 | Concurrent requests per model. |
max_model_workers | 4 | Models evaluated concurrently. |
num_retries | 4 | Retry attempts after the first failure. |
retry_base_delay | 1.0 | Initial backoff delay in seconds. |
retry_max_delay | 60.0 | Maximum backoff delay in seconds. |
retry_jitter | 0.25 | ±fraction of random jitter added to each delay. |