Skip to main content

Installation

The CLI is included with the package:
pip install aevyra-verdict
aevyra-verdict --help

run

Run evals on a dataset and print a comparison table.
aevyra-verdict run <dataset> [options]

Models

FlagShortDescription
--model-mModel in provider/model format. Repeat for multiple.
--config-cPath to a models config file (.yaml, .json, .toml).
--model and --config are mutually exclusive.

Metrics

FlagDescription
--metricBuilt-in metric: rouge, bleu, or exact. Repeat for multiple. Default: rouge.
--judgeAdd an LLM-as-judge using this model spec.
--judge-promptPath to a custom judge prompt template (.md or .txt).
--custom-metricCustom scoring function in file.py:function_name format. Repeat for multiple.

Dataset field mapping

FlagDescription
--input-fieldField name to use as the user message (for JSONL that doesn’t follow a standard schema). Example: --input-field question
--output-fieldField name to use as the reference answer. Omit for label-free datasets. Example: --output-field answer

Output

FlagShortDescription
--output-oSave results as JSON to this path.

Tuning

FlagDefaultDescription
--max-workers10Concurrent requests per model. Lower if hitting rate limits.
--temperature0.0Sampling temperature.
--max-tokens1024Max tokens per completion.

Examples

# Compare two models
aevyra-verdict run data.jsonl -m openai/gpt-5.4-nano -m qwen/qwen3.5-9b

# Use a config file
aevyra-verdict run data.jsonl --config models.yaml

# ROUGE + LLM judge, save results
aevyra-verdict run data.jsonl -m openai/gpt-5.4-nano \
  --metric rouge \
  --judge openai/gpt-5.4 \
  -o results.json

# Custom judge prompt and scoring function
aevyra-verdict run data.jsonl -m openai/gpt-5.4-nano \
  --judge openai/gpt-5.4 \
  --judge-prompt prompt.md \
  --custom-metric my_metrics.py:brevity_score

# Reduce concurrency if hitting rate limits
aevyra-verdict run data.jsonl --config models.yaml --max-workers 3

inspect

Preview a dataset without running any models.
aevyra-verdict inspect <dataset>
Shows sample count, whether reference answers are present, metadata keys, and the first sample.

providers

List all available providers and whether their API keys are configured.
aevyra-verdict providers
Available providers:

  openai       OPENAI_API_KEY         ✓ set
  anthropic    ANTHROPIC_API_KEY      ✗ not set
  google       GOOGLE_API_KEY         ✗ not set
  mistral      MISTRAL_API_KEY        ✗ not set
  cohere       COHERE_API_KEY         ✗ not set
  openrouter   OPENROUTER_API_KEY     ✗ not set