Collect human preference datasets for less-resourced languages and specific sectors,
while raising awareness about model diversity, bias, and environmental impact.
Built by the French government, now growing into new languages and sectors.
๐ซ๐ท French platform ยท ๐ฉ๐ฐ Danish platform
flowchart LR
U["๐ค Ask"] --> A["๐ค Compare"] --> V["๐ณ๏ธ Vote"] --> R["๐ Reveal"]
R --> L["๐ Leaderboard"]
R --> T["๐ง Rare data for model training"]
R --> M["๐บ๏ธ Use case mapping"]
R --> E1["๐ก Model diversity"]
R --> E2["โ๏ธ Bias awareness"]
R --> E3["๐ฑ Env. impact"]
style U fill:#f0f4ff,stroke:#3558a2
style A fill:#f0f4ff,stroke:#3558a2
style V fill:#f0f4ff,stroke:#3558a2
style R fill:#f0f4ff,stroke:#3558a2
style E1 fill:#e8f5e9,stroke:#388e3c
style E2 fill:#e8f5e9,stroke:#388e3c
style E3 fill:#e8f5e9,stroke:#388e3c
style L fill:#fff3e0,stroke:#e65100
style T fill:#fff3e0,stroke:#e65100
style M fill:#fff3e0,stroke:#e65100
๐ฆ User journey ๐ฉ Awareness value ๐ง Dataset value
|
Launched in October 2024 by DINUM and the French Ministry of Culture to address the lack of French-language preference data for LLM training nd evaluation. Since launch: 600,000+ prompts, 250,000+ preference votes, 300,000+ visitors. One of the largest non-English human preference datasets available. All data published openly on Hugging Face:
We published a pre-print to dive deep into the project's strategy in France. |
Compar:IA on the France 2 evening news, used in the classroom to teach students about AI models, bias, and environmental impact. |
๐ LanguagesMost LLMs underperform outside English. Compar:IA collects the preference data needed to close this gap. Already live in French and Danish, and planning launches in Sweden, Estonia and Lithuania. |
๐๏ธ SectorsGeneric benchmarks miss domain-specific needs. A sector arena reveals which models handle specialised language best. Healthcare, legal, education, public admin, agriculture... |
๐ข OrganisationsRun your own arena, evaluate models on your real-world tasks, and contribute data back to the commons. Governments, universities, hospitals, companies, NGOs... |
๐ก Raise awarenessTeach citizens and professionals about model diversity, bias, and environmental cost. Already used in schools and training sessions.
|
๐ Generate rare datasetsProduce instruction and preference data in less-ressourced languages.
|
๐ Downstream reuseData feeds into new model training, leaderboards, use case mappings, and other research topics.
|
The platform is fully open source, self-hostable, and customizable: choose your models, translate the interface, adapt prompt suggestions, add your logo. We can host it for you or help you set it up yourself.
Whatever your situation, reach out first and we'll figure out the best path together.
๐ฌ [email protected]
Compar:IA is a digital common. Whether you can offer funding, code, translations, or simply ideas, there is a place for you.
๐ฐ Financially. Compar:IA has been funded by DINUM and the French Ministry of Culture, with European support from ALT-EDIC. We are actively looking for new partners and funders to sustain the infrastructure, expand to new languages, and keep the project independent. [email protected]
๐ป In code. The entire platform is open source and we welcome contributions of all sizes: bug fixes, new features, translations, documentation. Come build with us. GitHub repository
๐ฌ In discussions. Share your ideas, flag issues, or just ask questions on GitHub Discussions. We want to hear from you. GitHub Discussions
Any other way. Partnerships, academic collaborations, media coverage, spreading the word: every contribution matters. Reach out and let's talk. Contact us
๐ Full technical roadmap on GitHub
The platform is fully open source and self-hostable. The quickest way to get running:
cp .env.example .env # Configure environment
make install # Install all dependencies
make dev # Start backend + frontend
For the full setup guide (Docker, manual setup, testing, database, models, i18n, architecture), see CONTRIBUTING.md.