OptimizerConfig
Configuration dataclass for the optimizer.Properties
| Property | Type | Default | Description |
|---|---|---|---|
strategy | str | "auto" | Optimization strategy name (or any custom registered name) |
max_iterations | int | 10 | Maximum optimization iterations |
score_threshold | float | 0.85 | Target score for convergence |
train_ratio | float | 0.8 | Fraction of examples used for optimization. The rest are held out for baseline and final eval. Set to 1.0 to disable splitting |
val_ratio | float | 0.1 | Fraction of total examples reserved as a validation set, carved from the training portion. Val scores are tracked per-iteration to detect overfitting. Set to 0.0 to disable |
early_stopping_patience | int | 3 | Stop optimization early if val score has not improved for this many consecutive iterations. Only active when val_ratio > 0. Set to 0 to disable |
batch_size | int | 0 | Mini-batch size per iteration. 0 = full training set. When > 0, each iteration samples this many examples at random from the training data. Baseline and final evals are unaffected |
batch_seed | int | 42 | Base seed for mini-batch sampling. Iteration i uses batch_seed + i so every batch is distinct but the run is reproducible |
full_eval_steps | int | 0 | When using mini-batch mode, run a full training-set eval every this many iterations. 0 = never. Has no effect when batch_size=0 |
max_workers | int | 4 | Thread pool size for parallel evaluation |
eval_runs | int | 1 | Eval passes to average for baseline and final verification. Reports mean ± std and tests significance |
reasoning_model | str | "claude-sonnet-4-20250514" | LLM used for reasoning (failure analysis, prompt rewriting) |
reasoning_provider | str | None | None | Provider: "anthropic", "openai", "ollama", or an alias. Auto-detected from model name if None |
reasoning_api_key | str | None | None | API key for the reasoning model |
reasoning_base_url | str | None | None | Base URL for self-hosted reasoning model endpoints |
eval_temperature | float | 0.0 | Temperature for target model |
target_model | str | None | None | Label of the model whose score is the target (set by verdict integration) |
target_source | str | None | None | How the target was set: "verdict_json", "verdict_run", or "manual" |
source_model | str | None | None | The model family this prompt was originally written for (e.g. "claude-sonnet", "gpt-4o"). Enables migration mode — the reasoning model adapts idioms for the target model |
extra_kwargs | dict | {} | Strategy-specific parameters |
PromptOptimizer
The main optimizer class. Uses a builder pattern for configuration.Methods
set_dataset(dataset)
Set the evaluation dataset.
| Parameter | Type | Description |
|---|---|---|
dataset | aevyra_verdict.Dataset | The dataset to evaluate against |
self for chaining.
add_provider(provider, model, **kwargs)
Add a model provider. Supports provider aliases (openrouter, together, groq,
etc.) which resolve automatically.
| Parameter | Type | Description |
|---|---|---|
provider | str | Provider name or alias |
model | str | Model identifier |
label | str | Optional display label |
api_key | str | Optional API key override |
base_url | str | Optional base URL override |
self for chaining.
add_metric(metric)
Add a scoring metric.
| Parameter | Type | Description |
|---|---|---|
metric | aevyra_verdict.Metric | A verdict metric instance |
self for chaining.
set_target_from_verdict(path, metric=None)
Set the score threshold from a verdict results JSON file. Parses the file,
finds the best model’s score, and uses it as the optimization target.
| Parameter | Type | Description |
|---|---|---|
path | str | Path | Path to verdict’s results JSON |
metric | str | None | Which metric to rank by. Defaults to the first metric in the file |
self for chaining. Sets config.score_threshold, config.target_model,
and config.target_source.
benchmark_and_set_target(prompt, providers, metric=None)
Run verdict with multiple models, then set the target from the best. This is
the “benchmark first, then optimize” flow.
| Parameter | Type | Description |
|---|---|---|
prompt | str | System prompt to benchmark |
providers | list[dict] | All providers to benchmark (including the target models) |
metric | str | None | Which metric to rank by. Defaults to first metric |
model_scores, best_model, best_score, and results.
run(system_prompt)
Run the optimization. Returns an OptimizationResult.
| Parameter | Type | Description |
|---|---|---|
system_prompt | str | The starting system prompt to optimize |
parse_verdict_results
Standalone function to parse a verdict results JSON file.| Parameter | Type | Description |
|---|---|---|
path | str | Path | Path to verdict results JSON |
metric | str | None | Which metric to rank by |
models, metrics, best_model, best_score,
target_model, and target_score.
Strategy registration
Register custom strategies so they can be used by name inOptimizerConfig
and the CLI -s flag.
register_strategy(name, cls)
| Parameter | Type | Description |
|---|---|---|
name | str | Short name for the strategy (used in CLI -s flag) |
cls | type[Strategy] | A class inheriting from Strategy |
TypeError if cls doesn’t inherit from Strategy.