Skip to main content

RunConfig

All runner behaviour is controlled through RunConfig:
from aevyra_verdict.runner import EvalRunner, RunConfig

config = RunConfig(
    temperature=0.0,
    max_tokens=1024,
    max_workers=10,
    max_model_workers=4,
    num_retries=4,
    retry_base_delay=1.0,
    retry_max_delay=60.0,
    retry_jitter=0.25,
)
runner = EvalRunner(config=config)

Concurrency

aevyra-verdict runs requests concurrently at two levels:
  • max_workers — concurrent requests per model (default: 10)
  • max_model_workers — models evaluated at the same time (default: 4)
So with 3 models and max_workers=10, up to 30 API calls can be in-flight simultaneously. If you’re hitting rate limits, lower max_workers first:
config = RunConfig(max_workers=3)

Rate limit handling

Rate-limit errors (HTTP 429) are automatically retried with exponential backoff and jitter. The delay before retry n is:
delay = min(base_delay × 2ⁿ, max_delay) ± jitter
With the defaults (base_delay=1.0, max_delay=60.0, jitter=0.25), the sequence is roughly: 1s → 2s → 4s → 8s → 16s → 32s → 60s (capped). Jitter adds ±25% randomness to prevent multiple concurrent workers from retrying in sync.
Auth errors (401, 403) and bad requests (400) are surfaced immediately without retrying — there’s no point burning retry budget on errors that won’t resolve themselves.

Tuning for large datasets

For large datasets (1000+ samples), these settings tend to work well:
config = RunConfig(
    max_workers=20,          # increase if your API quota allows
    max_model_workers=2,     # fewer models in parallel reduces peak load
    num_retries=6,           # more retries for long runs
    retry_base_delay=2.0,    # start backing off more aggressively
    retry_max_delay=120.0,
)

CLI tuning

# Reduce concurrency
aevyra-verdict run data.jsonl --config models.yaml --max-workers 3

# Deterministic outputs
aevyra-verdict run data.jsonl -m openai/gpt-5.4-nano --temperature 0.0