A fully local RAG (Retrieval-Augmented Generation) chat application built with Tauri, Rust, and Svelte. All processing happens on your machine - no API keys or cloud services required.
.txt, .md, and .pdf files| Dependency | Version | Installation |
|---|---|---|
| Rust | Latest stable | curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh |
| Node.js | v18+ | Download from website or use nvm install 18 |
Platform-specific dependencies:
xcode-select --install
sudo apt update
sudo apt install libwebkit2gtk-4.1-dev build-essential curl wget file \
libssl-dev libayatana-appindicator3-dev librsvg2-dev pkg-config
git clone https://github.com/ariyan-hawez/doc-chat-app.git
cd doc-chat-app
# Install Node.js dependencies
npm install
# Build Rust dependencies (first run takes ~5-10 min)
cd src-tauri && cargo build --release && cd ..
The app requires a llamafile model for LLM inference. Download one from Mozilla's releases:
| Model | Size | RAM Required | Notes |
|---|---|---|---|
Phi-3-mini-4k-instruct.Q4_K_M.llamafile |
~2.3GB | 4GB | Fast, good for most use cases |
Llama-3.2-3B-Instruct.Q4_K_M.llamafile |
~2GB | 6GB | Good balance of quality/speed |
Mistral-7B-Instruct-v0.2.Q4_K_M.llamafile |
~4GB | 8GB | Higher quality, slower |
# Create directory and download (example with Phi-3)
mkdir -p llamafile
curl -L -o llamafile/model.llamafile \
"https://huggingface.co/Mozilla/Phi-3-mini-4k-instruct-llamafile/resolve/main/Phi-3-mini-4k-instruct.Q4_K_M.llamafile"
# Make executable
chmod +x llamafile/model.llamafile
Note: The
llamafile/directory is gitignored. Each developer needs to download their own model.
Place documents in the data/ directory:
mkdir -p data
cp ~/Documents/my-notes.md data/
# Supports: .txt, .md, .pdf
# Development mode (with hot-reload)
npm run tauri dev
# Production build
npm run tauri build
On first run, the app will:
.fastembed_cache/data/Note: The
.fastembed_cache/directory is gitignored. The embedding model is auto-downloaded on first run.
Uses llamafile for fully local LLM inference with an OpenAI-compatible API.
Local embeddings via fastembed with BGE-Small-EN-v1.5:
The RAG (Retrieval-Augmented Generation) engine combines document retrieval with LLM generation:
Edit src-tauri/src/config.rs:
// RAG Settings
chunk_size: 500, // Characters per chunk
chunk_overlap: 50, // Overlap between chunks
top_k: 4, // Chunks to retrieve
// LLM Settings
temperature: 0.7, // Response creativity (0-1)
max_tokens: 1024, // Max response length
┌─────────────────────────────────────────────────────────────────┐
│ Svelte Frontend │
│ (src/lib/components/) │
└────────────────────────────┬────────────────────────────────────┘
│ Tauri IPC
┌────────────────────────────▼────────────────────────────────────┐
│ Rust Backend │
│ (src-tauri/src/) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Commands │ │ RAG Engine │ │ Llamafile Client │ │
│ │ (commands/) │──│ (rag/) │──│ (llamafile/) │ │
│ └──────────────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │
│ ┌──────────────┐ ┌──────▼───────┐ ┌──────────▼───────────┐ │
│ │ Ingest │ │ VectorStore │ │ HTTP to Llamafile │ │
│ │ (ingest/) │ │(vectorstore/)│ │ localhost:8080 │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────────────────────────────────────────▼─────────────┐
│ Llamafile Server │
│ (llamafile/model.llamafile) │
│ OpenAI-compatible API on localhost:8080 │
└─────────────────────────────────────────────────────────────────┘
User Question
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Embed │────▶│ Search │────▶│ Retrieve │
│ Question │ │ VectorStore │ │ Top-K │
└─────────────┘ └─────────────┘ └──────┬──────┘
│
┌─────────────────────────────────────────┘
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Build │────▶│ Send to │────▶│ Stream │
│ Prompt │ │ Llamafile │ │ Response │
└─────────────┘ └─────────────┘ └─────────────┘
Document Files (data/)
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Load │────▶│ Chunk │────▶│ Embed │
│ Documents │ │ Text │ │ Chunks │
└─────────────┘ └─────────────┘ └──────┬──────┘
│
▼
┌─────────────┐
│ Store in │
│ VectorStore │
└─────────────┘
ragchat/
├── src-tauri/ # Rust backend
│ ├── src/
│ │ ├── commands/ # Tauri IPC commands
│ │ ├── embedding/ # fastembed integration
│ │ ├── ingest/ # Document loading & chunking
│ │ ├── llamafile/ # LLM server management
│ │ ├── rag/ # RAG engine & prompts
│ │ └── vectorstore/ # SQLite vector storage
│ └── Cargo.toml
├── src/ # Svelte frontend
│ ├── lib/
│ │ ├── api/ # Tauri API wrapper
│ │ ├── components/ # UI components
│ │ └── stores/ # Svelte stores
│ └── App.svelte
├── data/ # Your documents go here
├── llamafile/ # LLM model (not in git)
└── docs/ # Additional documentation
| Model | RAM | GPU |
|---|---|---|
| phi-3-mini | 4GB | Optional |
| llama-3.2-3b | 6GB | Optional |
| mistral-7b | 8GB | Recommended |
# Ensure executable
chmod +x llamafile/model.llamafile
# Check if port 8080 is in use
lsof -i :8080
# Test manually
./llamafile/model.llamafile --help
The first query loads the model into memory (~10-30 seconds). Subsequent queries are faster.
# macOS: Install Xcode tools
xcode-select --install
# Linux: Install missing dependencies
sudo apt install pkg-config libssl-dev
Large files like .fastembed_cache/ and llamafile/ are gitignored. If you accidentally committed them:
git rm --cached -r src-tauri/.fastembed_cache/
git rm --cached -r llamafile/
git commit -m "Remove large files"
See docs/setup.md for more detailed troubleshooting.