This repository contains benchmarks for evaluating AI models on Svelte and SvelteKit.
You can view the evaluation reports at https://icycodes.github.io/svelte-benchmark/.
tasks/: Contains the benchmark tasks, each with its own instructions.jobs/: Stores the results of benchmark runs.site/: A Next.js application to visualize benchmark results.This benchmark is evaluated using the Harbor framework and the Pochi agent.
You can run the evaluation using the Harbor CLI. Here is an example:
harbor run \
--agent codex \
--model "gpt-5.2-codex" \
--env daytona \
--path ./tasks \
--n-attempts 1 \
--max-retries 5 \
--n-concurrent 5 \
--retry-include RuntimeError \
--retry-include DaytonaError \
--retry-include AgentTimeoutError
Before starting the evaluation, you should set the necessary environment variables for your chosen agent.
For example, if using Pochi, you should export POCHI_API_KEY.
Evaluation can be run locally with Docker (default), or using Daytona.io by setting --env daytona.
When running with Daytona, please note that Daytona blocks some network access for tier 1 and tier 2 users. If you meet any network issues, please refer to Daytona network limits.
Generated by Zealt