Install
Set your API keys
aevyra-verdict providers to see which keys are configured.
Prepare a dataset
Create a JSONL file where each line is a conversation in OpenAI message format. Theideal field is the reference answer used by scoring metrics.
Run your first eval
Run against a local model
If you have Ollama running locally, you can benchmark against it without any API keys:Save results
Next steps
Compare more models
Use a config file to manage multiple models including local vLLM instances
Add an LLM judge
Score responses with an LLM judge instead of reference-based metrics