Doc Chat App

ariyan-hawez

A fully local RAG (Retrieval-Augmented Generation) chat application built with Tauri, Rust, and Svelte. All processing happens on your machine - no API keys or cloud services required.

Download

RAG Chat

A fully local RAG (Retrieval-Augmented Generation) chat application built with Tauri, Rust, and Svelte. All processing happens on your machine - no API keys or cloud services required.

Features

100% Local: LLM inference, embeddings, and vector storage all run locally
Document Support: Ingest .txt, .md, and .pdf files
Modern UI: Svelte frontend with real-time chat interface
Cross-Platform: Runs on macOS, Linux, and Windows

Developer Setup

1. Install Prerequisites

Dependency	Version	Installation
Rust	Latest stable	`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \| sh`
Node.js	v18+	Download from website or use `nvm install 18`

Platform-specific dependencies:

macOS

xcode-select --install

Linux (Ubuntu/Debian)

sudo apt update
sudo apt install libwebkit2gtk-4.1-dev build-essential curl wget file \
  libssl-dev libayatana-appindicator3-dev librsvg2-dev pkg-config

Windows

Install Visual Studio Build Tools with C++ workload
WebView2 is pre-installed on Windows 10+

2. Clone and Install Dependencies

git clone https://github.com/ariyan-hawez/doc-chat-app.git
cd doc-chat-app

# Install Node.js dependencies
npm install

# Build Rust dependencies (first run takes ~5-10 min)
cd src-tauri && cargo build --release && cd ..

3. Download a Llamafile Model

The app requires a llamafile model for LLM inference. Download one from Mozilla's releases:

Model	Size	RAM Required	Notes
`Phi-3-mini-4k-instruct.Q4_K_M.llamafile`	~2.3GB	4GB	Fast, good for most use cases
`Llama-3.2-3B-Instruct.Q4_K_M.llamafile`	~2GB	6GB	Good balance of quality/speed
`Mistral-7B-Instruct-v0.2.Q4_K_M.llamafile`	~4GB	8GB	Higher quality, slower

# Create directory and download (example with Phi-3)
mkdir -p llamafile
curl -L -o llamafile/model.llamafile \
  "https://huggingface.co/Mozilla/Phi-3-mini-4k-instruct-llamafile/resolve/main/Phi-3-mini-4k-instruct.Q4_K_M.llamafile"

# Make executable
chmod +x llamafile/model.llamafile

Note: The llamafile/ directory is gitignored. Each developer needs to download their own model.

4. Add Documents (Optional)

Place documents in the data/ directory:

mkdir -p data
cp ~/Documents/my-notes.md data/
# Supports: .txt, .md, .pdf

5. Run the App

# Development mode (with hot-reload)
npm run tauri dev

# Production build
npm run tauri build

On first run, the app will:

Download the embedding model (~33MB) to .fastembed_cache/
Start the llamafile server on port 8080
Index any documents in data/

Note: The .fastembed_cache/ directory is gitignored. The embedding model is auto-downloaded on first run.

Models

LLM (Llamafile)

Uses llamafile for fully local LLM inference with an OpenAI-compatible API.

Embeddings (fastembed)

Local embeddings via fastembed with BGE-Small-EN-v1.5:

Dimension: 384
Size: ~33MB (auto-downloaded on first run)
Performance: Fast CPU inference via ONNX Runtime

RAG Engine

The RAG (Retrieval-Augmented Generation) engine combines document retrieval with LLM generation:

How It Works

Document Ingestion: Files are loaded, chunked (500 chars with 50 char overlap), and embedded
Query Processing: User questions are embedded using the same model
Retrieval: Top-k similar chunks are found via cosine similarity
Generation: Retrieved context + question are sent to the LLM for answer generation

Configuration

Edit src-tauri/src/config.rs:

// RAG Settings
chunk_size: 500,      // Characters per chunk
chunk_overlap: 50,    // Overlap between chunks
top_k: 4,             // Chunks to retrieve

// LLM Settings
temperature: 0.7,     // Response creativity (0-1)
max_tokens: 1024,     // Max response length

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Svelte Frontend                          │
│                    (src/lib/components/)                        │
└────────────────────────────┬────────────────────────────────────┘
                             │ Tauri IPC
┌────────────────────────────▼────────────────────────────────────┐
│                        Rust Backend                             │
│                      (src-tauri/src/)                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │   Commands   │  │  RAG Engine  │  │   Llamafile Client   │  │
│  │  (commands/) │──│   (rag/)     │──│    (llamafile/)      │  │
│  └──────────────┘  └──────┬───────┘  └──────────┬───────────┘  │
│                           │                      │              │
│  ┌──────────────┐  ┌──────▼───────┐  ┌──────────▼───────────┐  │
│  │   Ingest     │  │ VectorStore  │  │   HTTP to Llamafile  │  │
│  │  (ingest/)   │  │(vectorstore/)│  │    localhost:8080    │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                                                    │
┌───────────────────────────────────────────────────▼─────────────┐
│                     Llamafile Server                            │
│              (llamafile/model.llamafile)                        │
│         OpenAI-compatible API on localhost:8080                 │
└─────────────────────────────────────────────────────────────────┘

Query Flow

User Question
     │
     ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Embed     │────▶│   Search    │────▶│  Retrieve   │
│  Question   │     │ VectorStore │     │  Top-K      │
└─────────────┘     └─────────────┘     └──────┬──────┘
                                               │
     ┌─────────────────────────────────────────┘
     ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Build     │────▶│   Send to   │────▶│   Stream    │
│   Prompt    │     │  Llamafile  │     │  Response   │
└─────────────┘     └─────────────┘     └─────────────┘

Ingestion Flow

Document Files (data/)
     │
     ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Load      │────▶│   Chunk     │────▶│   Embed     │
│  Documents  │     │   Text      │     │   Chunks    │
└─────────────┘     └─────────────┘     └──────┬──────┘
                                               │
                                               ▼
                                        ┌─────────────┐
                                        │   Store in  │
                                        │ VectorStore │
                                        └─────────────┘

Project Structure

ragchat/
├── src-tauri/           # Rust backend
│   ├── src/
│   │   ├── commands/    # Tauri IPC commands
│   │   ├── embedding/   # fastembed integration
│   │   ├── ingest/      # Document loading & chunking
│   │   ├── llamafile/   # LLM server management
│   │   ├── rag/         # RAG engine & prompts
│   │   └── vectorstore/ # SQLite vector storage
│   └── Cargo.toml
├── src/                 # Svelte frontend
│   ├── lib/
│   │   ├── api/         # Tauri API wrapper
│   │   ├── components/  # UI components
│   │   └── stores/      # Svelte stores
│   └── App.svelte
├── data/                # Your documents go here
├── llamafile/           # LLM model (not in git)
└── docs/                # Additional documentation

Hardware Requirements

Model	RAM	GPU
phi-3-mini	4GB	Optional
llama-3.2-3b	6GB	Optional
mistral-7b	8GB	Recommended

Troubleshooting

Llamafile won't start

# Ensure executable
chmod +x llamafile/model.llamafile

# Check if port 8080 is in use
lsof -i :8080

# Test manually
./llamafile/model.llamafile --help

Slow first response

The first query loads the model into memory (~10-30 seconds). Subsequent queries are faster.

Out of memory

Use a smaller quantized model (Q4 instead of Q8)
Close other memory-intensive applications

Build errors

# macOS: Install Xcode tools
xcode-select --install

# Linux: Install missing dependencies
sudo apt install pkg-config libssl-dev

Push rejected (large files)

Large files like .fastembed_cache/ and llamafile/ are gitignored. If you accidentally committed them:

git rm --cached -r src-tauri/.fastembed_cache/
git rm --cached -r llamafile/
git commit -m "Remove large files"

See docs/setup.md for more detailed troubleshooting.

Top categories