BetaWordList
BetaWordList is a modern, cross-platform vocabulary analysis tool built with Tauri, Svelte, and Rust. Designed for linguists, researchers, and anyone needing to build word lists from large-scale Chinese text corpora efficiently and interactively.
π Overview
BetaWordList enables you to:
- Load pre-trained NLP models for Chinese word segmentation and POS tagging
- Batch analyze multiple text files with real-time progress feedback
- Explore results in a powerful, interactive table with advanced filtering and sorting
- Export filtered results to CSV for further analysis
β¨ Features
- One-Click Model Loading
No need to configure model pathsβjust click "Load Model" and go!
- Batch File Analysis
Select and analyze multiple .txt
files at once.
- Real-Time Progress
See which file is being processed and overall progress.
- Interactive Results Table
- Column sorting: Click any column header to sort (ascending/descending/none)
- Fixed columns: "Word" and "POS" always visible
- Responsive layout: Prevents column overlap
- Hover tooltips: See full metric names and values
- Advanced Filtering
- By word length (e.g., only 2-character words)
- By POS tag
- By metric value with operators (
>
, >=
, <
, <=
, =
)
- CSV Export
- Download all filtered results as a CSV file
- Smart file naming:
wordlist_results_{timestamp}.csv
- User Experience
- Data statistics: original, filtered, and current page counts
- Clear sorting/filtering state indicators
- One-click reset for all filters
- Fully responsive for desktop and laptop screens
π οΈ Tech Stack
- Frontend: Svelte, TailwindCSS, Lucide Icons
- Backend: Rust, Tauri, LTP NLP library
- File System: Secure, permission-based access via Tauri plugins
π¦ Getting Started
Install dependencies:
bun install
Run the app in development:
bun run tauri dev
Build for production:
bun run tauri build
π TODO
π€ Contributing
Pull requests, issues, and suggestions are welcome! Please open an issue or PR if you have ideas or bug reports.
π License
MIT