Evals for LLMs to learn and benchmark their Svelte skills.
This repository is split into two main concepts:
evals/<name> and include a prompt plus a runnable project and tests.experiments/<name>.ts and configure how evals are executed with
@vercel/agent-eval.pnpm install
cp .env.example .env
Fill in AI_GATEWAY_API_KEY plus either VERCEL_TOKEN or VERCEL_OIDC_TOKEN.@vercel/agent-evalExperiments are defined in experiments/*.ts. The CLI expects the evals/ folder to be a
sibling of experiments/.
# Run a single experiment by name (experiments/basic.ts)
npx @vercel/agent-eval basic
# Or run by path
npx @vercel/agent-eval experiments/basic.ts
# Run every experiment in the repository
npx @vercel/agent-eval
Results are written to results/<experiment-name>/<timestamp>/.
@vercel/agent-eval-playgroundThe agent-eval CLI exposes a playground command that launches
@vercel/agent-eval-playground under the hood:
npx @vercel/agent-eval-playground --results-dir ./results --evals-dir ./evals --port 3000
Open the URL it prints (default: http://localhost:3000) to browse results.
Use the script in scripts/add-eval.ts to scaffold a new eval:
pnpm run add-eval
The script will:
evals/<your-eval-name>/ from assets/default-project/evals/<your-eval-name>/PROMPT.mdAfterward, edit EVAL.ts and any tests inside the new eval to define success criteria.
experiments/ (for example, experiments/my-experiment.ts).ExperimentConfig using the shared helper:import { experiment } from '../shared/experiment-base.ts';
export default experiment({
evals: ['my-eval'],
runs: 2,
editPrompt(prompt) {
return `${prompt}\n\nExtra instructions...`;
},
});
npx @vercel/agent-eval my-experiment