-s <name>.
Auto (default)
The auto strategy runs a multi-phase pipeline:- Run a baseline eval to measure the starting score
- The reasoning model analyzes the prompt’s weaknesses and recommends an optimization axis
- Apply that axis for a few iterations
- Re-evaluate — if the threshold is met, stop
- Otherwise the reasoning model picks the next axis based on what changed
- Repeat until the global iteration budget runs out
Iterative
Each iteration:- Run completions with the current prompt via verdict
- Score all responses with the configured metrics
- Identify the worst-scoring samples
- The reasoning model analyzes the failures and proposes a revised prompt
- If the score meets the threshold, stop; otherwise repeat
Structural
Optimizes the organization and formatting of the prompt:- Run eval with the current prompt structure
- Generate variants using different transformations:
- Markdown headers for clear sections
- XML tags for structural clarity
- Minimal flat paragraphs
- Role/task/format split
- Constraint emphasis
- Task decomposition
- Input-anchored layout
- The reasoning model also generates a free-form structural improvement
- Evaluate all variants in parallel; keep the best
- Repeat with the winning structure
Structural evaluates multiple variants per iteration. Use
--max-workers to
control parallelism. For Ollama, see the parallelism guide.PDO (Prompt Duel Optimizer)
Tournament-style search over prompt variants using dueling bandits with Thompson sampling:- Generate an initial pool of diverse prompts
- Each round, Thompson sampling selects two prompts to duel
- Both are evaluated on a sample of the dataset
- An LLM judge picks the winner on each sample; majority wins the duel
- Win matrix is updated; Copeland rankings recalculated
- Periodically, top-ranked prompts are mutated to explore new variants
- Worst performers are pruned to keep the pool manageable
Few-shot
Optimizes which examples to include in the prompt:- Bootstrap: run the bare instruction and collect highest-scoring samples as candidate exemplars
- The reasoning model selects a diverse, informative subset
- Build a composite prompt: instruction + curated few-shot examples
- Evaluate, identify remaining failures
- The reasoning model swaps examples to better cover the failure modes
- Periodically re-bootstrap to discover new exemplar candidates
Custom strategies
You can implement your own strategy by subclassingStrategy and registering
it. Your strategy then works in both the Python API and the CLI.
run() method receives the full eval infrastructure — dataset, providers,
metrics, the reasoning model (as agent), and the optimizer config — so your
strategy has everything it needs to evaluate prompts and propose improvements.
Register your strategy before calling
PromptOptimizer.run() or the CLI.
A common pattern is to put the registration in a module that’s imported at
startup.