Svelte Benchmark

icycodes

Download

Svelte and SvelteKit Benchmark

This repository contains benchmarks for evaluating AI models on Svelte and SvelteKit.

You can view the evaluation reports at https://icycodes.github.io/svelte-benchmark/.

Project Structure

tasks/: Contains the benchmark tasks, each with its own instructions.
jobs/: Stores the results of benchmark runs.
site/: A Next.js application to visualize benchmark results.

Getting Started

This benchmark is evaluated using the Harbor framework and the Pochi agent.

Running Evaluation

You can run the evaluation using the Harbor CLI. Here is an example:

harbor run \
  --agent codex \
  --model "gpt-5.2-codex" \
  --env daytona \
  --path ./tasks \
  --n-attempts 1 \
  --max-retries 5 \
  --n-concurrent 5 \
  --retry-include RuntimeError \
  --retry-include DaytonaError \
  --retry-include AgentTimeoutError

Evaluation Details

Before starting the evaluation, you should set the necessary environment variables for your chosen agent. For example, if using Pochi, you should export POCHI_API_KEY.

Evaluation can be run locally with Docker (default), or using Daytona.io by setting --env daytona.

When running with Daytona, please note that Daytona blocks some network access for tier 1 and tier 2 users. If you meet any network issues, please refer to Daytona network limits.

Generated by Zealt

Top categories