A toolkit for building sentence-based Anki decks from Tatoeba data.
The project takes a list of target words, or the top N words from a frequency list, finds useful example sentences, enriches each card with translations, difficulty metadata, and Google Text-to-Speech audio, then exports a ready-to-import .apkg file.
This repository is built around one config-driven pipeline.
Each generated deck can include:
[sound:...] tags.apkg package that can be imported into AnkiUse the Nix shell to install required packages:
nix develop
Edit apps/deck-cli/deck.config.jsonc, then run:
cd apps/deck-cli
bun run build
By default, generated files are written under output/ from paths configured in apps/deck-cli/deck.config.jsonc.
The pipeline is controlled by apps/deck-cli/deck.config.jsonc.
The checked-in config references apps/deck-cli/deck.config.schema.json, so editors with JSON Schema support should provide validation and completion.
To run with another config file:
DECK_CONFIG_PATH=/absolute/or/relative/path.jsonc bun run src/index.ts
The default CLI entrypoint is apps/deck-cli/src/index.ts.
Available pass names:
retrieve - fetch matching Tatoeba sentence rows into the CSVenrich-translations - add word and n-gram translation metadataenrich-translation-alternatives - fill missing translation alternatives where availableenrich-difficulty - score and sort cards by difficultyenrich-audio - generate Google TTS audio and timestamp metadatabuild-apkg - build the Anki packageYou can remove passes from apps/deck-cli/deck.config.jsonc when iterating on a specific stage. For example, after retrieving sentences once, you can rerun only enrichment or packaging passes against the existing CSV.
Argos Translate runs locally through the FastAPI service in apps/argos-translate-service/.
Start it from the deck CLI directory:
cd apps/deck-cli
bun run argos:start
Then set:
"translation": {
"provider": "argos",
"sourceLanguage": "de",
"targetLanguage": "en",
"argos": {
"translateUrl": "http://127.0.0.1:8000/translate",
"cachePath": "../../output/argos-translate-cache.json",
"alternatives": 2
}
}
The service host and port can be overridden with:
ARGOS_HOST=127.0.0.1
ARGOS_PORT=8000
Google Translate uses either an API key, an access token, or local Application Default Credentials.
Set the provider to google:
"translation": {
"provider": "google",
"sourceLanguage": "de",
"targetLanguage": "en",
"argos": {
"translateUrl": "http://127.0.0.1:8000/translate",
"cachePath": "../../output/argos-translate-cache.json",
"alternatives": 2
},
"google": {
"translateUrl": "https://translation.googleapis.com/language/translate/v2",
"cachePath": "../../output/google-translate-cache.json",
"accessToken": null,
"apiKey": null,
"quotaProject": null
}
}
The audio pass uses Google Cloud Text-to-Speech and requires OAuth2 credentials. API keys are not supported for this endpoint.
For local development, use Application Default Credentials:
gcloud auth application-default login
Make sure the relevant APIs are enabled in your Google Cloud project:
ffmpeg must be available on PATH; the CLI transcodes generated speech to AAC before packaging it into Anki.
The card UI lives in apps/card-template/ and is bundled into a single HTML artifact:
cd apps/deck-cli
bun run template:build
The generated artifact is written to apps/card-template/dist/index.html. The APKG build uses this template automatically.
bun run config:schema - regenerate apps/deck-cli/deck.config.schema.jsonbun run sentenceRetrieval:update - update local Tatoeba language metadatabun run wordFrequencies:words - update local frequency word dataThis project is feature-complete for its original goal: generating personal Anki sentence decks from Tatoeba with translation hints, difficulty ordering, audio, and a bundled card UI.
Future work is expected to be maintenance, source updates, small quality fixes.
Dont hesitate to send a pr and feel free to reach out if you have any questions :)
MIT
See LICENSE