WebLLMChat Svelte Themes

Webllmchat

AI app that runs llms in the browser supported by Svelte 5 and Skeleton 3. It also supports a RAG with PDF, DOCX, TXT, drag and drop, that is configurable with a subsystem. Best of all is that it leaks no data to a backend - it's completely private.

WebLLM Chat

๐ŸŒ Live Demo: https://randomtask2000.github.io/WebLLMChat/

A privacy-first AI assistant with powerful document analysis capabilities that runs entirely in your browser, built with SvelteKit, Skeleton UI and WebLLM. Features include RAG (Retrieval-Augmented Generation) capabilities, multiple theme support, and persistent chat history.

Features

  • ๐Ÿค– WebLLM Integration: Run large language models directly in the browser
  • ๐Ÿ“„ RAG Support: Upload and search through documents (DOCX, PDF, TXT, MD, CSV)
  • ๐ŸŽจ Multiple Themes: Switch between Skeleton, Wintry, Modern, and Crimson themes
  • ๐Ÿ’ฌ Chat History: Persistent chat sessions stored in browser localStorage
  • ๐Ÿ”„ Model Management: Download and switch between different LLM models
  • โšก Responsive Design: Works on desktop and mobile devices
  • ๐Ÿงช Fully Tested: Comprehensive unit and integration tests
  • ๐Ÿ“ฑ PWA Ready: Can be installed as a Progressive Web App

Available Models

  • Llama-3.2-3B-Instruct (Context7): Default model with extended 128k context (2GB VRAM)
  • TinyLlama-1.1B: Fastest loading, minimal VRAM (512MB, 2k context)
  • Llama-3.2-1B-Instruct: Small model with 128k context (1GB VRAM)
  • Llama-3.2-3B-Instruct: Standard 128k context version (2GB VRAM)
  • Llama-3.1-8B-Instruct: Highest quality responses (5GB VRAM, 128k context)
  • Qwen2.5-7B-Instruct: Excellent coding capabilities (4GB VRAM, 128k context)
  • Phi-3.5-mini-instruct: Microsoft's efficient model (2GB VRAM, 128k context)

Technology Stack

  • Framework: SvelteKit with TypeScript
  • Styling: TailwindCSS with Skeleton UI components
  • LLM Engine: WebLLM (runs models in browser via WebGPU)
  • Document Processing: PDF.js for PDF parsing
  • Testing: Vitest for unit tests, Playwright for integration tests
  • Build: Vite for fast development and optimized production builds

NPM Packages

Core Dependencies

Mobile Support

Framework & Build Tools

UI & Styling

Testing

Development Tools

TypeScript Support

Development

Prerequisites

  • Node.js 18+
  • Modern browser with WebGPU support (Chrome 113+, Edge 113+)

Setup

#Install dependencies
npm install

#Start development server
npm run dev

#Run tests
npm run test

#Run integration tests
npm run test:integration

#Type checking
npm run check

#Build for production
npm run build

#Preview production build
npm run preview

Project Structure

src/
โ”œโ”€โ”€ lib/
โ”‚   โ”œโ”€โ”€ components/          # Svelte components
โ”‚   โ”‚   โ”œโ”€โ”€ ChatInterface.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ ChatMessage.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ DocumentManager.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ DragDropZone.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ FeatureToggle.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ FileUpload.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ MobileLayout.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ ModelDropdown.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ ModelManager.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ RAGContext.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ Sidebar.svelte
โ”‚   โ”‚   โ”œโ”€โ”€ ThemeSwitcher.svelte
โ”‚   โ”‚   โ””โ”€โ”€ WelcomeGuide.svelte
โ”‚   โ”œโ”€โ”€ config/              # Configuration files
โ”‚   โ”‚   โ””โ”€โ”€ features.ts
โ”‚   โ”œโ”€โ”€ services/            # Service layer
โ”‚   โ”‚   โ”œโ”€โ”€ embedding-service.ts
โ”‚   โ”‚   โ”œโ”€โ”€ rag-service.ts
โ”‚   โ”‚   โ””โ”€โ”€ vector-store.ts
โ”‚   โ”œโ”€โ”€ stores/              # Svelte stores for state management
โ”‚   โ”‚   โ”œโ”€โ”€ chat.ts
โ”‚   โ”‚   โ”œโ”€โ”€ documents.ts
โ”‚   โ”‚   โ”œโ”€โ”€ models.ts
โ”‚   โ”‚   โ””โ”€โ”€ theme.ts
โ”‚   โ”œโ”€โ”€ types/               # TypeScript type definitions
โ”‚   โ”‚   โ”œโ”€โ”€ index.ts
โ”‚   โ”‚   โ””โ”€โ”€ rag.ts
โ”‚   โ””โ”€โ”€ utils/               # Utility functions
โ”‚       โ”œโ”€โ”€ document-processor.ts
โ”‚       โ”œโ”€โ”€ mobile.ts
โ”‚       โ”œโ”€โ”€ model-loading.ts
โ”‚       โ”œโ”€โ”€ timeFormat.ts
โ”‚       โ”œโ”€โ”€ tokenCount.ts
โ”‚       โ””โ”€โ”€ webllm.ts
โ”œโ”€โ”€ routes/                  # SvelteKit routes
โ”‚   โ”œโ”€โ”€ +layout.svelte
โ”‚   โ”œโ”€โ”€ +layout.ts
โ”‚   โ””โ”€โ”€ +page.svelte
โ”œโ”€โ”€ app.css                  # Global styles
โ””โ”€โ”€ app.html                 # Main HTML template

tests/
โ”œโ”€โ”€ unit/                    # Unit tests
โ”‚   โ”œโ”€โ”€ document-processor-advanced.test.ts
โ”‚   โ”œโ”€โ”€ document-processor.test.ts
โ”‚   โ””โ”€โ”€ stores.test.ts
โ””โ”€โ”€ integration/             # Playwright integration tests
    โ””โ”€โ”€ chat-flow.test.ts

static/                      # Static assets
โ”œโ”€โ”€ _headers
โ”œโ”€โ”€ favicon.ico
โ””โ”€โ”€ manifest.json

Usage

  1. First Launch: The app automatically loads a small 1B parameter model for immediate use
  2. Model Management: Click "Models" to download larger, more capable models
  3. Chat: Type messages in the input field and press Enter to send
  4. Document Upload: Click "Documents" to upload files for RAG functionality
  5. Theme Switching: Use the theme picker to change the visual style
  6. Chat History: Previous conversations are automatically saved and can be restored

RAG (Retrieval-Augmented Generation)

The app includes a powerful client-side RAG system that enhances AI responses with your uploaded documents.

Supported Document Formats

  • ๐Ÿ“• PDF files - Full text extraction with metadata (title, author, page count)
  • ๐Ÿ“˜ Word documents (.docx) - Preserves document structure (headings, lists, paragraphs)
  • ๐Ÿ“ Text files (.txt) - Plain text processing
  • ๐Ÿ“‹ Markdown files (.md) - Markdown content processing

How to Use RAG

  1. Upload Documents:

    • Click the + button in the chat input area, or
    • Drag and drop files directly into the chat area
    • Use the ๐Ÿ“ Documents button in the top bar
  2. Ask Questions: The AI will automatically search your documents and use relevant content to answer

  3. View RAG Context: Click the RAG Context button (right sidebar) to see:

    • Uploaded documents with chunk counts
    • Search results from your last query
    • RAG settings and configuration

RAG Settings

Access settings in the RAG Context panel:

  • Chunk Size (50-1000 tokens): Smaller chunks find specific facts better
  • Overlap Size (0-200 tokens): Overlap between chunks for better context
  • Search Accuracy (0-100%):
    • Low (0-30%): Fuzzy matching, more results
    • Medium (40-60%): Balanced approach
    • High (70-100%): Exact matching only

Advanced RAG Commands

Use the /find command to search for exact sentences:

/find tree
/find [term]
/find any term

Or use natural language:

  • "Find sentences containing [term]"
  • "Show me the exact sentence with [term]"
  • "Quote the sentence about Jacob"
  • "Find where it says about Lehi"

Visual Indicators

  • Token Badge: Shows when RAG context is used in responses
  • Source Citations: Responses end with "๐Ÿ“š Source: [filename]"
  • Search Status: "๐Ÿ” Searching through X documents..." appears during search
  • Processing Status: Shows file type icons (๐Ÿ“• PDF, ๐Ÿ“˜ DOCX, ๐Ÿ“„ Text) during upload

Advanced Features

  • Smart Chunking: Documents are intelligently split preserving structure (headings, paragraphs)
  • Metadata Extraction: PDFs extract title, author, page count automatically
  • Structure Preservation: DOCX files maintain heading hierarchy and lists
  • Page Tracking: PDF chunks remember their source page numbers

For more detailed RAG usage instructions, see RAG_USAGE_GUIDE.md

Browser Compatibility

Requires a modern browser with WebGPU support:

  • Chrome/Chromium 113+
  • Edge 113+
  • Safari 16.4+ (experimental)
  • Firefox (behind flags)

Technical Stack

Core Framework

  • SvelteKit - Full-stack framework for building web applications
  • Svelte 5 - Reactive UI framework with compile-time optimizations
  • TypeScript - Type-safe JavaScript for better developer experience

AI & Machine Learning

  • WebLLM - In-browser LLM inference engine powered by WebGPU
  • WebGPU - Next-generation web graphics API for GPU acceleration

UI & Styling

Document Processing

  • PDF.js - PDF rendering and text extraction
  • Mammoth.js - DOCX to HTML conversion

RAG System

  • Custom TF-IDF Implementation - Lightweight text embeddings
  • IndexedDB - Browser-based vector storage
  • Web Workers - Background processing for embeddings

Build Tools

Testing

Mobile Support

Architecture Overview

The application follows a client-side architecture where all processing happens in the browser:

  1. LLM Inference: WebLLM loads and runs language models directly in the browser using WebGPU
  2. Document Processing: Files are processed client-side for privacy
  3. RAG Pipeline: Documents โ†’ Chunking โ†’ Embeddings โ†’ Vector Store โ†’ Semantic Search
  4. State Management: Svelte stores for reactive state management
  5. Persistent Storage: IndexedDB for documents, embeddings, and chat history

System Architecture Diagram

graph TB
    User([User]) --> UI[Chat Interface]
    UI --> ChatStore[Chat Store]
    UI --> DocStore[Document Store]
    UI --> ModelStore[Model Store]
    
    subgraph "Frontend Layer"
        UI
        ChatStore
        DocStore
        ModelStore
        ThemeStore[Theme Store]
    end
    
    subgraph "Service Layer"
        ModelManager[Model Manager]
        RAGService[RAG Service]
        EmbeddingService[Embedding Service]
        VectorStore[Vector Store]
        DocProcessor[Document Processor]
    end
    
    subgraph "WebLLM Engine"
        WebLLMCore[WebLLM Core]
        WebGPU[WebGPU Runtime]
        ModelCache[Model Cache]
    end
    
    subgraph "Storage Layer"
        IndexedDB[(IndexedDB)]
        LocalStorage[(Local Storage)]
        BrowserCache[(Browser Cache)]
    end
    
    subgraph "Document Processing"
        PDFParser[PDF.js Parser]
        DOCXParser[Mammoth.js Parser]
        TextParser[Text Parser]
        MDParser[Markdown Parser]
    end
    
    ChatStore --> ModelManager
    DocStore --> RAGService
    ModelStore --> ModelManager
    
    ModelManager --> WebLLMCore
    RAGService --> EmbeddingService
    RAGService --> VectorStore
    RAGService --> DocProcessor
    
    DocProcessor --> PDFParser
    DocProcessor --> DOCXParser
    DocProcessor --> TextParser
    DocProcessor --> MDParser
    
    WebLLMCore --> WebGPU
    WebLLMCore --> ModelCache
    
    ModelManager --> IndexedDB
    VectorStore --> IndexedDB
    RAGService --> IndexedDB
    ChatStore --> LocalStorage
    ThemeStore --> LocalStorage
    ModelCache --> BrowserCache
    
    style User fill:#e1f5fe
    style WebGPU fill:#fff3e0
    style IndexedDB fill:#f3e5f5
    style LocalStorage fill:#f3e5f5
    style BrowserCache fill:#f3e5f5

UML Class Diagram

classDiagram
    class ChatInterface {
        -messages: ChatMessage[]
        -inputText: string
        -isLoading: boolean
        +sendMessage(text: string): void
        +clearChat(): void
        +handleFileUpload(files: File[]): void
    }
    
    class ModelManager {
        -currentModel: string
        -availableModels: ModelInfo[]
        -loadingProgress: number
        +loadModel(modelId: string): Promise~void~
        +switchModel(modelId: string): void
        +getModelInfo(): ModelInfo
    }
    
    class RAGService {
        -vectorStore: VectorStore
        -embeddingService: EmbeddingService
        +addDocument(doc: Document): Promise~void~
        +search(query: string): Promise~SearchResult[]~
        +updateSettings(settings: RAGSettings): void
    }
    
    class VectorStore {
        -embeddings: Map~string, Vector~
        -documents: Map~string, DocumentChunk~
        +addEmbedding(id: string, vector: Vector): void
        +findSimilar(query: Vector, k: number): SearchResult[]
        +clear(): void
    }
    
    class EmbeddingService {
        -model: EmbeddingModel
        +generateEmbedding(text: string): Promise~Vector~
        +batchEmbed(texts: string[]): Promise~Vector[]~
    }
    
    class DocumentProcessor {
        +processFile(file: File): Promise~ProcessedDocument~
        +extractText(buffer: ArrayBuffer, type: string): Promise~string~
        +chunkDocument(text: string, options: ChunkOptions): DocumentChunk[]
    }
    
    class WebLLMCore {
        -engine: MLCEngine
        -config: EngineConfig
        +initializeEngine(): Promise~void~
        +chat(messages: Message[]): Promise~string~
        +streamChat(messages: Message[]): AsyncGenerator~string~
    }
    
    class ChatStore {
        -sessions: ChatSession[]
        -currentSession: ChatSession
        +addMessage(message: ChatMessage): void
        +createSession(): ChatSession
        +loadSession(id: string): void
        +saveToLocalStorage(): void
    }
    
    class DocumentStore {
        -documents: Document[]
        -chunks: DocumentChunk[]
        +addDocument(doc: Document): void
        +removeDocument(id: string): void
        +getChunksByDocument(docId: string): DocumentChunk[]
    }
    
    class ThemeStore {
        -currentTheme: string
        -availableThemes: Theme[]
        +setTheme(theme: string): void
        +getTheme(): Theme
    }
    
    ChatInterface --> ChatStore : uses
    ChatInterface --> ModelManager : uses
    ChatInterface --> RAGService : uses
    ChatInterface --> DocumentProcessor : uses file upload
    
    ModelManager --> WebLLMCore : manages
    
    RAGService --> VectorStore : stores vectors
    RAGService --> EmbeddingService : generates embeddings
    RAGService --> DocumentStore : retrieves documents
    
    DocumentProcessor --> DocumentStore : stores processed docs
    
    ChatStore --> WebLLMCore : sends messages
    
    VectorStore --> IndexedDB : persists data
    DocumentStore --> IndexedDB : persists data
    ChatStore --> LocalStorage : persists sessions
    ThemeStore --> LocalStorage : persists theme
    
    class IndexedDB {
        <<external>>
        +put(store: string, data: any): Promise~void~
        +get(store: string, key: string): Promise~any~
        +delete(store: string, key: string): Promise~void~
    }
    
    class LocalStorage {
        <<external>>
        +setItem(key: string, value: string): void
        +getItem(key: string): string
        +removeItem(key: string): void
    }

LLM Access Sequence Diagram

sequenceDiagram
    participant User
    participant ChatInterface
    participant ChatStore
    participant ModelManager
    participant WebLLMCore
    participant WebGPU
    participant BrowserCache
    
    User->>ChatInterface: Types message and sends
    ChatInterface->>ChatStore: addMessage(userMessage)
    ChatStore->>ChatStore: Save to LocalStorage
    ChatInterface->>ModelManager: checkModelLoaded()
    
    alt Model not loaded
        ModelManager->>WebLLMCore: initializeEngine()
        WebLLMCore->>BrowserCache: Check for cached model
        alt Model cached
            BrowserCache-->>WebLLMCore: Return cached model
        else Model not cached
            WebLLMCore->>WebLLMCore: Download model from CDN
            WebLLMCore->>BrowserCache: Cache model
        end
        WebLLMCore->>WebGPU: Load model to GPU
        WebGPU-->>WebLLMCore: Model ready
        WebLLMCore-->>ModelManager: Engine initialized
    end
    
    ChatInterface->>ChatStore: getConversationHistory()
    ChatStore-->>ChatInterface: Return messages[]
    ChatInterface->>ModelManager: generateResponse(messages)
    ModelManager->>WebLLMCore: streamChat(messages)
    WebLLMCore->>WebGPU: Process tokens
    
    loop Streaming response
        WebGPU-->>WebLLMCore: Generated tokens
        WebLLMCore-->>ModelManager: Token stream
        ModelManager-->>ChatInterface: Update response
        ChatInterface->>ChatInterface: Display streaming text
    end
    
    ChatInterface->>ChatStore: addMessage(aiResponse)
    ChatStore->>ChatStore: Save to LocalStorage
    ChatInterface-->>User: Display complete response

RAG Access Sequence Diagram

sequenceDiagram
    participant User
    participant ChatInterface
    participant DocumentProcessor
    participant RAGService
    participant EmbeddingService
    participant VectorStore
    participant DocumentStore
    participant IndexedDB
    
    Note over User,ChatInterface: Document Upload Flow
    User->>ChatInterface: Uploads document
    ChatInterface->>DocumentProcessor: processFile(file)
    
    alt PDF Document
        DocumentProcessor->>DocumentProcessor: PDF.js extraction
    else DOCX Document
        DocumentProcessor->>DocumentProcessor: Mammoth.js conversion
    else Text/Markdown
        DocumentProcessor->>DocumentProcessor: Direct text processing
    end
    
    DocumentProcessor->>DocumentProcessor: chunkDocument(text, options)
    DocumentProcessor->>DocumentStore: addDocument(processedDoc)
    DocumentStore->>IndexedDB: persistDocument()
    
    DocumentProcessor->>RAGService: indexDocument(chunks)
    
    loop For each chunk
        RAGService->>EmbeddingService: generateEmbedding(chunkText)
        EmbeddingService->>EmbeddingService: TF-IDF calculation
        EmbeddingService-->>RAGService: Return vector
        RAGService->>VectorStore: addEmbedding(chunkId, vector)
    end
    
    VectorStore->>IndexedDB: persistEmbeddings()
    RAGService-->>ChatInterface: Document indexed
    ChatInterface-->>User: Show upload success
    
    Note over User,ChatInterface: Query with RAG Flow
    User->>ChatInterface: Sends question
    ChatInterface->>RAGService: search(userQuery)
    RAGService->>EmbeddingService: generateEmbedding(query)
    EmbeddingService-->>RAGService: Query vector
    
    RAGService->>VectorStore: findSimilar(queryVector, k)
    VectorStore->>VectorStore: Calculate cosine similarity
    VectorStore-->>RAGService: Top k chunks
    
    RAGService->>DocumentStore: getChunkDetails(chunkIds)
    DocumentStore->>IndexedDB: fetchChunks()
    IndexedDB-->>DocumentStore: Chunk data
    DocumentStore-->>RAGService: Enriched chunks
    
    RAGService->>RAGService: Format context
    RAGService-->>ChatInterface: SearchResults + Context
    
    ChatInterface->>ChatInterface: Augment prompt with context
    ChatInterface->>ModelManager: generateResponse(augmentedPrompt)
    Note right of ModelManager: Continue with LLM flow
    
    ChatInterface-->>User: Display response with sources

Performance Notes

  • First model load may take 1-5 minutes depending on internet speed
  • Models are cached locally after first download
  • Larger models provide better quality but require more VRAM
  • The app continues working offline after initial model download

App Naming and Rebranding

To customize the app name and branding for your own deployment, you'll need to update several configuration files:

Primary Configuration

src/app.config.json (Line 2) - Main app title source

{
  "title": "Your App Name",
  "description": "Your app description"
}

Additional Files to Update

  1. Mobile App Configuration - capacitor.config.ts (Line 5)

    appName: 'Your App Name',
    
  2. Progressive Web App Manifest - static/manifest.json (Lines 2-3)

    {
      "name": "Your App Name",
      "short_name": "Your App Name",
      "description": "Your app description"
    }
    
  3. Package Configuration - package.json (Line 2)

    "name": "your-app-name"
    

Notes

  • The primary src/app.config.json configuration automatically updates most UI references through the {appConfig.title} system
  • Components that display the app name (ChatInterface, WelcomeGuide, MobileLayout) will automatically use the updated title
  • Remember to also update any documentation, README files, and deployment configurations with your new app name

License

MIT License - feel free to use this project as a starting point for your own applications.

Top categories

Loading Svelte Themes