Strategies

Reflex ships with five strategies. The auto strategy (default) chains multiple axes adaptively. Each axis can also be used standalone with -s <name>.

Auto (default)

The auto strategy runs a multi-phase pipeline:

Run a baseline eval to measure the starting score
The reasoning model analyzes the prompt’s weaknesses and recommends an optimization axis
Apply that axis for a few iterations
Re-evaluate — if the threshold is met, stop
Otherwise the reasoning model picks the next axis based on what changed
Repeat until the global iteration budget runs out

A typical run: structural (fix formatting) → iterative (fix wording) → fewshot (add examples) — each phase builds on the previous one’s improvements. Auto requires at least two phases before it considers converging, because LLM judge scores can be noisy and a single high score may not hold on re-evaluation.

aevyra-reflex optimize dataset.jsonl prompt.md -m local/llama3.1

Iterative

Each iteration:

Run completions with the current prompt via verdict
Score all responses with the configured metrics
Identify the worst-scoring samples
The reasoning model analyzes the failures and proposes a revised prompt
If the score meets the threshold, stop; otherwise repeat

The reasoning model maintains a causal rewrite log across iterations — a compact record of what was changed each round and what score delta resulted. From iteration 2 onwards, this history is injected into the prompt so the model knows which approaches helped (✓), had no effect (✗ no effect), or hurt (✗ hurt) — and can avoid repeating dead ends.

Rewrite history:
Iter 1 (score: 0.6234, Δ+0.0871 — ✓ helped): Added numbered reasoning steps
Iter 2 (score: 0.7105, Δ+0.0029 — ✗ no effect): Added "think carefully" instruction

Best for: prompts that are structurally fine but have wording issues, missing constraints, or ambiguous instructions.

aevyra-reflex optimize dataset.jsonl prompt.md -m local/llama3.1:8b -s iterative

Structural

Optimizes the organization and formatting of the prompt:

Run eval with the current prompt structure
Generate variants using different transformations:
- Markdown headers for clear sections
- XML tags for structural clarity
- Minimal flat paragraphs
- Role/task/format split
- Constraint emphasis
- Task decomposition
- Input-anchored layout
The reasoning model also generates a free-form structural improvement
Evaluate all variants in parallel; keep the best
Repeat with the winning structure

Best for: prompts that are long, disorganized, or missing clear structure.

aevyra-reflex optimize dataset.jsonl prompt.md -m local/llama3.1:8b -s structural

Structural evaluates multiple variants per iteration. Use --max-workers to control parallelism. For Ollama, see the parallelism guide.

PDO (Prompt Duel Optimizer)

Tournament-style search over prompt variants using dueling bandits with Thompson sampling:

Generate an initial pool of diverse prompts
Each round, Thompson sampling selects two prompts to duel
Both are evaluated on a sample of the dataset
An LLM judge picks the winner on each sample; majority wins the duel
Win matrix is updated; Copeland rankings recalculated
Periodically, top-ranked prompts are mutated to explore new variants
Worst performers are pruned to keep the pool manageable

Based on the PDO paper (arXiv:2510.13907). Best for: when you have budget for many evaluations and want broad exploration of the prompt space.

aevyra-reflex optimize dataset.jsonl prompt.md \
  -m local/llama3.1:8b\
  -s pdo \
  --max-iterations 50

Few-shot

Optimizes which examples to include in the prompt:

Bootstrap: run the bare instruction and collect highest-scoring samples as candidate exemplars
The reasoning model selects a diverse, informative subset
Build a composite prompt: instruction + curated few-shot examples
Evaluate, identify remaining failures
The reasoning model swaps examples to better cover the failure modes
Periodically re-bootstrap to discover new exemplar candidates

Best for: tasks where showing the model examples helps more than refining instructions (translation, classification, structured extraction).

aevyra-reflex optimize dataset.jsonl prompt.md -m local/llama3.1:8b -s fewshot

Custom strategies

You can implement your own strategy by subclassing Strategy and registering it. Your strategy then works in both the Python API and the CLI.

from aevyra_reflex import Strategy, register_strategy
from aevyra_reflex.result import OptimizationResult, IterationRecord

class MonteCarloStrategy(Strategy):
    def run(self, *, initial_prompt, dataset, providers, metrics,
            agent, config, on_iteration=None):
        best_prompt = initial_prompt
        best_score = 0.0
        iterations = []

        for i in range(config.max_iterations):
            # Generate a candidate, evaluate it, track the best...
            record = IterationRecord(i + 1, candidate, score)
            iterations.append(record)
            if on_iteration:
                on_iteration(record)
            if score >= config.score_threshold:
                break

        return OptimizationResult(
            best_prompt=best_prompt,
            best_score=best_score,
            iterations=iterations,
            converged=best_score >= config.score_threshold,
        )

register_strategy("montecarlo", MonteCarloStrategy)

Then use it like any built-in:

aevyra-reflex optimize dataset.jsonl prompt.md -m local/llama3.1:8b -s montecarlo

The run() method receives the full eval infrastructure — dataset, providers, metrics, the reasoning model (as agent), and the optimizer config — so your strategy has everything it needs to evaluate prompts and propose improvements.

Register your strategy before calling PromptOptimizer.run() or the CLI. A common pattern is to put the registration in a module that’s imported at startup.

Getting started

Guides

Tutorials

API reference

Auto (default)

Iterative

Structural

PDO (Prompt Duel Optimizer)

Few-shot

Custom strategies

​Auto (default)

​Iterative

​Structural

​PDO (Prompt Duel Optimizer)

​Few-shot

​Custom strategies

Auto (default)

Iterative

Structural

PDO (Prompt Duel Optimizer)

Few-shot

Custom strategies