Skip to main content

OptimizerConfig

Configuration dataclass for the optimizer.
from aevyra_reflex import OptimizerConfig

config = OptimizerConfig(
    strategy="auto",
    max_iterations=10,
    score_threshold=0.85,
    train_ratio=0.8,               # 70/10/20 train/val/test split (default)
    val_ratio=0.1,                 # fraction for validation (0 = disabled)
    early_stopping_patience=3,     # stop when val stagnates for N iters (0 = disabled)
    batch_size=0,                  # 0 = full training set; >0 = examples per iter
    batch_seed=42,                 # base seed for mini-batch sampling
    full_eval_steps=0,             # full-set checkpoint every N iters (0 = disabled)
    max_workers=4,
    eval_runs=1,                   # eval passes to average (1 = single pass)
    reasoning_model="claude-sonnet-4-20250514",
    reasoning_provider=None,       # "anthropic", "openai", "ollama", or alias
    reasoning_api_key=None,        # defaults to provider's env var
    reasoning_base_url=None,       # for self-hosted endpoints
    eval_temperature=0.0,
    extra_kwargs={},
)

Properties

PropertyTypeDefaultDescription
strategystr"auto"Optimization strategy name (or any custom registered name)
max_iterationsint10Maximum optimization iterations
score_thresholdfloat0.85Target score for convergence
train_ratiofloat0.8Fraction of examples used for optimization. The rest are held out for baseline and final eval. Set to 1.0 to disable splitting
val_ratiofloat0.1Fraction of total examples reserved as a validation set, carved from the training portion. Val scores are tracked per-iteration to detect overfitting. Set to 0.0 to disable
early_stopping_patienceint3Stop optimization early if val score has not improved for this many consecutive iterations. Only active when val_ratio > 0. Set to 0 to disable
batch_sizeint0Mini-batch size per iteration. 0 = full training set. When > 0, each iteration samples this many examples at random from the training data. Baseline and final evals are unaffected
batch_seedint42Base seed for mini-batch sampling. Iteration i uses batch_seed + i so every batch is distinct but the run is reproducible
full_eval_stepsint0When using mini-batch mode, run a full training-set eval every this many iterations. 0 = never. Has no effect when batch_size=0
max_workersint4Thread pool size for parallel evaluation
eval_runsint1Eval passes to average for baseline and final verification. Reports mean ± std and tests significance
reasoning_modelstr"claude-sonnet-4-20250514"LLM used for reasoning (failure analysis, prompt rewriting)
reasoning_providerstr | NoneNoneProvider: "anthropic", "openai", "ollama", or an alias. Auto-detected from model name if None
reasoning_api_keystr | NoneNoneAPI key for the reasoning model
reasoning_base_urlstr | NoneNoneBase URL for self-hosted reasoning model endpoints
eval_temperaturefloat0.0Temperature for target model
target_modelstr | NoneNoneLabel of the model whose score is the target (set by verdict integration)
target_sourcestr | NoneNoneHow the target was set: "verdict_json", "verdict_run", or "manual"
source_modelstr | NoneNoneThe model family this prompt was originally written for (e.g. "claude-sonnet", "gpt-4o"). Enables migration mode — the reasoning model adapts idioms for the target model
extra_kwargsdict{}Strategy-specific parameters

PromptOptimizer

The main optimizer class. Uses a builder pattern for configuration.
from aevyra_reflex import PromptOptimizer, OptimizerConfig

result = (
    PromptOptimizer(OptimizerConfig(strategy="auto"))
    .set_dataset(dataset)
    .add_provider("local", "llama3.1")
    .add_metric(RougeScore())
    .set_target_from_verdict("results.json")
    .run("You are a helpful assistant.")
)

Methods

set_dataset(dataset)

Set the evaluation dataset.
ParameterTypeDescription
datasetaevyra_verdict.DatasetThe dataset to evaluate against
Returns self for chaining.

add_provider(provider, model, **kwargs)

Add a model provider. Supports provider aliases (openrouter, together, groq, etc.) which resolve automatically.
ParameterTypeDescription
providerstrProvider name or alias
modelstrModel identifier
labelstrOptional display label
api_keystrOptional API key override
base_urlstrOptional base URL override
Returns self for chaining.

add_metric(metric)

Add a scoring metric.
ParameterTypeDescription
metricaevyra_verdict.MetricA verdict metric instance
Returns self for chaining.

set_target_from_verdict(path, metric=None)

Set the score threshold from a verdict results JSON file. Parses the file, finds the best model’s score, and uses it as the optimization target.
optimizer.set_target_from_verdict("results.json")

# Or rank by a specific metric
optimizer.set_target_from_verdict("results.json", metric="bleu")
ParameterTypeDescription
pathstr | PathPath to verdict’s results JSON
metricstr | NoneWhich metric to rank by. Defaults to the first metric in the file
Returns self for chaining. Sets config.score_threshold, config.target_model, and config.target_source.

benchmark_and_set_target(prompt, providers, metric=None)

Run verdict with multiple models, then set the target from the best. This is the “benchmark first, then optimize” flow.
from aevyra_reflex.optimizer import _resolve_provider

target_providers = [
    _resolve_provider("openai", "gpt-4o-mini"),
    _resolve_provider("openai", "gpt-4o"),
]

benchmark = optimizer.benchmark_and_set_target(
    "You are a helpful assistant.",
    optimizer._providers + target_providers,
)

print(benchmark["best_model"])   # "openai/gpt-4o"
print(benchmark["best_score"])   # 0.92
print(benchmark["model_scores"]) # {"openai/gpt-4o": 0.92, ...}
ParameterTypeDescription
promptstrSystem prompt to benchmark
providerslist[dict]All providers to benchmark (including the target models)
metricstr | NoneWhich metric to rank by. Defaults to first metric
Returns a dict with model_scores, best_model, best_score, and results.

run(system_prompt)

Run the optimization. Returns an OptimizationResult.
ParameterTypeDescription
system_promptstrThe starting system prompt to optimize

parse_verdict_results

Standalone function to parse a verdict results JSON file.
from aevyra_reflex import parse_verdict_results

parsed = parse_verdict_results("results.json")
print(parsed["best_model"])   # "openai/gpt-4o-mini"
print(parsed["best_score"])   # 0.8765
print(parsed["models"])       # dict of all models with their scores
ParameterTypeDescription
pathstr | PathPath to verdict results JSON
metricstr | NoneWhich metric to rank by
Returns a dict with models, metrics, best_model, best_score, target_model, and target_score.

Strategy registration

Register custom strategies so they can be used by name in OptimizerConfig and the CLI -s flag.
from aevyra_reflex import Strategy, register_strategy
from aevyra_reflex.result import OptimizationResult

class MonteCarloStrategy(Strategy):
    def run(self, *, initial_prompt, dataset, providers, metrics,
            agent, config, on_iteration=None):
        # ... your optimization loop ...
        return OptimizationResult(
            best_prompt=best,
            best_score=score,
            iterations=iterations,
            converged=True,
        )

register_strategy("montecarlo", MonteCarloStrategy)

register_strategy(name, cls)

ParameterTypeDescription
namestrShort name for the strategy (used in CLI -s flag)
clstype[Strategy]A class inheriting from Strategy
Raises TypeError if cls doesn’t inherit from Strategy.

list_strategies()

Returns a sorted list of all registered strategy names.
from aevyra_reflex.strategies import list_strategies
print(list_strategies())  # ['auto', 'fewshot', 'iterative', 'montecarlo', 'pdo', 'structural']