Skip to main content

OptimizationResult

Returned by PromptOptimizer.run(). Contains the optimized prompt, scores, iteration history, and analysis.

Properties

PropertyTypeDescription
best_promptstrThe highest-scoring prompt found
best_scorefloatThe score of the best prompt
iterationslist[IterationRecord]All iteration records
convergedboolWhether the score threshold was reached
baselineEvalSnapshot | NoneBaseline eval snapshot (on held-out test set if split is enabled)
finalEvalSnapshot | NoneFinal verification snapshot (on held-out test set if split is enabled)
train_sizeintNumber of training examples used for optimization (0 if no split)
test_sizeintNumber of held-out test examples used for baseline and final eval (0 if no split)
val_sizeintNumber of validation examples tracked per-iteration (0 if val_ratio=0)
val_trajectorylist[float]Val set mean score after each optimization iteration (empty if no val split)
early_stoppedboolTrue if optimization was stopped early because val score plateaued
batch_sizeintPer-iteration mini-batch size (0 = full training set was used)
p_valuefloat | Nonep-value from paired significance test (Wilcoxon or t-test). None if fewer than 2 samples or scipy not installed
is_significantbool | NoneTrue if p_value < 0.05. None when p_value is unavailable
total_eval_tokensintTotal tokens used by the eval model across the run
total_reasoning_tokensintTotal tokens used by the reasoning model across the run
strategy_namestr | NoneStrategy that was used
phase_historylist[dict] | NoneAuto mode phase breakdown

Computed properties

PropertyTypeDescription
score_trajectorylist[float]Score at each iteration
improvementfloat | NoneAbsolute score improvement (final − baseline)
improvement_pctfloat | NonePercentage improvement

Methods

summary()

Returns a formatted string with scores, trajectory, strategy analysis, prompt diff, and before/after example.
print(result.summary())

to_dict()

Serialize to a dictionary.
data = result.to_dict()

to_json(path)

Save full results to a JSON file.
from pathlib import Path
result.to_json(Path("results.json"))

save_best_prompt(path)

Write the optimized prompt to a text file.
from pathlib import Path
result.save_best_prompt(Path("best_prompt.md"))

IterationRecord

A single optimization iteration.
PropertyTypeDescription
iterationintIteration number
system_promptstrThe prompt used in this iteration
scorefloatOverall score
scores_by_metricdict[str, float]Per-metric scores
reasoningstrAgent’s reasoning for the change
eval_tokensintTokens used by the eval model this iteration
reasoning_tokensintTokens used by the reasoning model this iteration
change_summarystrOne-line description of what the agent changed (e.g. “Added output format constraints”)
val_scorefloat | NoneValidation set score for this iteration (None when val_ratio=0)

EvalSnapshot

Scores from a single eval run (baseline or final).
PropertyTypeDescription
mean_scorefloatMean score across all samples (and across runs when eval_runs > 1)
std_scorefloatStd dev of per-run mean scores. 0.0 when eval_runs=1
n_runsintNumber of eval passes averaged to produce mean_score. 1 by default
scores_by_metricdict[str, float]Per-metric mean scores
system_promptstrThe system prompt used
sampleslist[SampleSnapshot]Per-sample results (scores averaged across runs when eval_runs > 1)
total_tokensintTotal tokens used by the eval model in this snapshot

SampleSnapshot

A single sample’s input, output, and score.
PropertyTypeDescription
inputstrThe input prompt
responsestrThe model’s response
idealstrThe reference answer
scorefloatScore for this sample