AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration using the Vercel AI Gateway. Automatically discovers and runs all tests in the tests/ directory, verifying LLM-generated Svelte components against test suites.
To install dependencies:
bun install
Configure your API keys in .env:
bun run vercel:link and link the benchmark to a project that has AI Gateway enabledYou'll need at least one API key for the providers you want to test:
VERCEL_OIDC_TOKEN: The OIDC token for vercel AI gatewayTo run the benchmark:
bun run index.ts
The benchmark features an interactive CLI that will prompt you for configuration:
Model Selection: Choose one or more models from the Vercel AI Gateway
MCP Integration: Choose your MCP configuration
https://mcp.svelte.dev/mcp)npx -y @sveltejs/mcp)TestComponent Tool: Enable/disable the testing tool for models
After configuration, the benchmark will:
tests/ directoryResults are saved to the results/ directory with timestamped filenames:
results/result-2024-12-07-14-30-45.json - Full execution trace with all test resultsresults/result-2024-12-07-14-30-45.html - Interactive HTML report with expandable test sectionsThe HTML report includes:
To regenerate an HTML report from a JSON file:
# Regenerate most recent result
bun run generate-report.ts
# Regenerate specific result
bun run generate-report.ts results/result-2024-12-07-14-30-45.json
Each test in the tests/ directory should have:
tests/
{test-name}/
Reference.svelte - Reference implementation (known-good solution)
test.ts - Vitest test file (imports "./Component.svelte")
prompt.md - Prompt for the AI agent
The benchmark:
prompt.mdTo verify that all reference implementations pass their tests:
bun run verify-tests
This copies each Reference.svelte to Component.svelte temporarily and runs the tests.
The tool supports optional integration with MCP (Model Context Protocol) servers through the interactive CLI. When running the benchmark, you'll be prompted to choose:
https://mcp.svelte.dev/mcpnpx -y @sveltejs/mcpMCP status, transport type, and server configuration are documented in both the JSON metadata and displayed as a badge in the HTML report.
0: All tests passed1: One or more tests failedSee AGENTS.md for detailed documentation on:
This project was created using bun init in bun v1.3.3. Bun is a fast all-in-one JavaScript runtime.