GEPA
GEPA: REFLECTIVE PROMPT EVOLUTION CAN OUTPERFORM REINFORCEMENT LEARNING link
Motivation
How can we extract maximal learning signal from every expensive rollout to enable effective adaptation of complex, modular AI systems in low-data or budgetconstrained settings?
Overview
Genetic prompt evolution
- Onitializing a candidate pool P, where a candidate is a concrete instantiation of the learnable parameters of the compound system, ⟨Π, Θ⟩Φ. Initially, the candidate pool consists only of the base system’s parameters as the sole candidate.
- Proposes increasingly effective candidates by modifying existing ones through mutation or crossover, informed by learning signals from newly gathered rollouts and while tracking each new candidates’ ancestry. Each new candidate inherits learning signals from its parents, as well as signals from the current rollout.
- During each iteration, GEPA identifies promising candidates from the candidate pool (candidate selection), proposes a new candidate—possibly by mutating prompts in a module based on reflective feedback or by performing crossover between two candidates—and evaluates this new variant on a minibatch of tasks. If the newly proposed candidate demonstrates improved performance relative to its parent(s) on the local minibatch, then GEPA adds the new candidate to the candidate pool P. This involves tracking internal data structures including tracking the ancestry of the new candidate, along with the full evaluation of the new candidate on a Dpareto, a validation set used for candidate selection.
- After the budget is depleted, GEPA returns the candidate with the best aggregate performance on Dpareto.
Reflection using natural language feedback
- Given a selected candidate to mutate in the current iteration of the optimization loop, GEPA updates the system with the candidate parameters, selects a target module within the system to improve (via round robin to ensure all modules receive updates), and generates a few rollouts over a minibatch sampled from the training dataset, recording their outcomes (success/failure).
- By examining the execution traces of the system, GEPA identifies the target module’s inputs, outputs, and reasoning. With this, GEPA uses an LLM to reflectively examine this information, attributing successes or failures to elements of the module’s prompt (or omission thereof), and propose new instructions for the target module.
- A new candidate is then proposed as a copy of the current candidate, with the target module’s prompt updated to the new proposed prompt.
System Aware Merge
Pareto-based candidate selection
- Identifies the highest score achieved for each individual training instance across all candidates in the pool, creating a “Pareto frontier” of scores achieved by the optimization process so far.
- Compiles a list of candidates that achieve the best score on at least one training task. This filters the pool down to candidates that incorporate “winning” strategies, preserving every valuable insight discovered in any reflective mutation.
- Prunes candidates that are strictly dominated: for instance, if Candidate 2 has the best score on Task 1 only, but Candidate 3 achieves that same best score on Task 1 and the best on Task 2, Candidate 2 is removed.
- Stochastically samples a candidate from this pruned list, assigning higher selection probability to candidates that achieved the best score across more training instances.
Enjoy Reading This Article?
Here are some more articles you might like to read next: