GEPAzilla is the open-source GEPA prompt optimizer named after the friendly GEPAzilla dinosaur who smashes weak prompts and keeps the strongest contenders. The project includes a marketing splash page plus a BYO API-key console so curious builders can jump in quickly. Curate datasets, configure deterministic and LLM-powered scorers, and watch GEPA iterate on your system prompt while tracking latency, cost, and diagnostics—all on your machine. Visit gepazilla.com to meet the mascot and launch the console.
cp .env.example .env.local # fill in required keys
pnpm install
pnpm devThe marketing page lives at http://localhost:3000. Launch the console at http://localhost:3000/optimizer to work with datasets and scorers. The Open console (BYO API key) button on the homepage routes to the same place.
| Variable | Description |
|---|---|
AI_GATEWAY_API_KEY |
Required for GEPA runs, scorer previews, and /api/models discovery. Supply your own AI Gateway credential. |
How to obtain a key: GEPAzilla expects a Vercel AI Gateway token. Create or select a Vercel project, enable AI Gateway, and generate a gateway API key. Copy that value into
.env.localbefore runningpnpm dev.
Copy .env.example to .env.local and provide values before running the dev server. You can export the keys before running pnpm dev, or place them in .env.local.
GEPAzilla ships with a starter dataset so you can experiment instantly. If you’d like a larger fixture, import data/sample-dataset.json through the dataset menu.
- Training rows drive candidate exploration and feed reflection. Edit them inline, duplicate tricky cases, and move hold-outs to validation from each row’s overflow menu.
- Validation rows stay read-only during reflection and act as the generalisation check. They appear in the same table with a muted flag at the bottom.
- Dataset tools currently support copying or pasting JSON payloads from the clipboard; use the “Use” pill on each row to toggle Training/Validation.
- Add scorers from the Scoring Criteria panel. Deterministic plugins (exact match, regex, length) run instantly in the browser, while async plugins (LLM judges) execute during optimizer runs.
- Multiple instances of each scorer type are supported. Set weights to control how they contribute to the aggregate correctness objective.
- The Results tab now surfaces a Scorer diagnostics card summarising failure rates, averages, and the most common notes per scorer so you can tune signal quality quickly.
The optimizer UI mirrors the GEPA paper’s reflection loop while keeping the UX approachable. Here’s how we expect contributors to exercise it:
- Prime GEPAzilla – Set the task and reflection models (BYO API key) in the Run dock, choose your reflection batch, and confirm the skip-perfect toggle matches your dataset. The header pill (“Meet GEPAzilla”) links to an in-app guide if you forget what each field does.
- Shape the dataset – Use the Training/Validation toggles to curate high-signal examples. GEPAzilla’s reflective dataset generator samples underperforming rows; keeping validation rows pristine makes the Pareto gate meaningful.
- Tune the scoring stack – Combine deterministic plugins (latency, regex, exact match) with LLM judges. Weighting and optional duplication lets you emphasize metrics that matter. The
/api/scorers/testendpoint powers the “Preview scorer” button so you can sanity-check configuration before a run. - Run and iterate – Start the run (
⌘/Ctrl + Enterworks). Watch the Logs tab for scorer notes and telemetry, then pivot to Results to inspect candidate prompts, iteration trends, and scorer diagnostics. Apply the preferred prompt directly from Results without leaving the page. - Reflect and repeat – If a scorer misbehaves or the dataset falls short, adjust and run again. All telemetry remains local; GEPAzilla never exfiltrates run data.
The “How it works” drawer (top-right) walks through the same flow with annotated screenshots for new contributors.
The optimizer records span-level diagnostics to power the run dock, but everything stays in your browser. Telemetry events are never sent to a remote service. Advanced debugging flags such as DEBUG_TELEMETRY and NEXT_PUBLIC_DEBUG_TELEMETRY are documented in CONTRIBUTING.md.
| Command | Purpose |
|---|---|
pnpm dev |
Start the Turbopack dev server |
pnpm lint |
Run ESLint across the project |
pnpm exec tsc --noEmit |
Type-check the codebase without emit |
pnpm test |
Run the Vitest suite once |
pnpm test -- --run --coverage |
Run Vitest with v8 coverage |
- Start with CONTRIBUTING.md for environment setup, commit checks, and PR etiquette.
- The Scorer Authoring Guide explains how to create new scorer plugins and hook them into the registry and UI.
- CI executes
pnpm lint,pnpm exec tsc --noEmit, andpnpm test -- --runon every push and pull request (see.github/workflows/ci.yml).
The application ships as a standard Next.js project. To produce a production build:
pnpm install
pnpm build
pnpm startSet AI_GATEWAY_API_KEY in the runtime environment before launching either pnpm build or pnpm start. Optional debug flags should remain unset (or false) in production unless you are actively debugging locally.
- Dependency licenses are tracked in
docs/THIRD_PARTY_LICENSES.json. - Static asset sources are listed in
docs/ASSET_ATTRIBUTIONS.md.
GEPAzilla is released under the MIT License.
All community activity is governed by our Code of Conduct.
