Gaming Community Analytics Tracker

BTECHK

Gaming community sentiment analytics — FastAPI + Svelte dashboard tracking topic trends across news, forum, and video sources

Download

CommunityPulse

Gaming Community Sentiment Dashboard

CommunityPulse aggregates sentiment and trends across multiple platforms — YouTube, community forums, news sites, and search trends — into a single dashboard. Players and content creators see what the community is talking about without manually browsing every source.

This project showcases cross-platform data ingestion, NLP-driven topic discovery, real-time sentiment tracking, and a responsive Svelte 5 frontend backed by a FastAPI async API.

Tech Stack

Layer	Technology
Frontend	SvelteKit 2 + Svelte 5 (runes: `$state`, `$derived`, `$effect`)
Backend	FastAPI (Python 3.11+), async SQLAlchemy 2, Pydantic v2
Database	PostgreSQL 17
Cache	Valkey 8 (Redis-compatible)
NLP	BERTopic + Sentence Transformers + cardiffnlp RoBERTa sentiment
Toxicity	Detoxify (Unitary)
AI Summaries	Google Gemini 1.5 Flash (optional)
Scheduling	APScheduler (async)
Container	Docker Compose (6 services)

Architecture

Current State

6-container Docker Compose deployment:

SvelteKit frontend — SSR + client hydration, Svelte 5 rune-based stores, CSS custom properties theming
FastAPI backend — async request handling, Pydantic validation, structured error responses
NLP worker — isolated container running BERTopic + sentiment + toxicity models (keeps API memory footprint low)
PostgreSQL — relational store for posts, sentiment results, topic aggregations, dead letter queue
Valkey — cache layer for aggregation snapshots, quota tracking, rate limiting
APScheduler — 6-hour ingestion cycles across all sources, staggered NLP + aggregation passes

Planned End-State

Key additions in the roadmap:

WebSocket push for live topic updates (replacing polling)
Kubernetes deployment with horizontal pod autoscaling on the NLP worker
Event-driven ingestion via message queue (replacing scheduled polling)
User accounts with saved dashboards and alert thresholds
Historical trend graphs with time-series analysis

Feature Spotlights

NLP Pipeline

The NLP pipeline processes ingested posts through three stages:

Topic Detection — BERTopic clusters posts into dynamic topics using seed-guided vocabularies. Topics get human-readable names via Gemini API or a fallback dictionary.
Sentiment Analysis — cardiffnlp/twitter-roberta-base-sentiment-latest scores each post as positive/neutral/negative with confidence. Results are stored with a configurable TTL for re-analysis.
Toxicity Filtering — Detoxify screens posts before storage. Toxic content is flagged and excluded from aggregations.

Resilience: A circuit breaker wraps each model call — after N failures the circuit opens, and a dead letter queue captures posts for retry on the next scheduler pass.

Multi-Source Ingestion

Eight data source adapters feed the pipeline:

Source	Method	Rate Strategy
YouTube	Data API v3	Daily quota tracking (9k/10k budget)
Official News	Publisher RSS	Locale-aware, 6-hour cycle
TierSite	Web scraping	Polite crawling with backoff
GuideSite	Web scraping	Polite crawling with backoff
News Source A	RSS feed	30-item window per cycle
News Source B	RSS feed	30-item window per cycle
Reddit	Public JSON API	No auth needed, 50-post window
Google Trends	pytrends	60s inter-request delay, 12-hour cycle

Each adapter implements a common DataSourceAdapter interface. The IngestionService handles deduplication (external ID upsert) and forwards clean posts to the NLP queue.

Project Structure

gaming-community-analytics-tracker/
├── backend/
│   ├── app/
│   │   ├── api/routes/            # REST endpoints (dashboard, ingestion, feedback)
│   │   ├── dashboard/             # Aggregation, explanation generation, patch tracking
│   │   ├── ingestion/
│   │   │   ├── adapters/          # 8 data source adapters
│   │   │   ├── service.py         # Dedup + upsert orchestration
│   │   │   └── scheduler.py       # APScheduler job configuration
│   │   ├── models/                # SQLAlchemy async models
│   │   ├── nlp/
│   │   │   ├── topics.py          # BERTopic seed-guided clustering
│   │   │   ├── sentiment.py       # RoBERTa sentiment scoring
│   │   │   ├── toxicity.py        # Detoxify toxicity detection
│   │   │   ├── circuit_breaker.py # Failure isolation
│   │   │   └── dead_letter.py     # Retry queue for failed analyses
│   │   └── services/              # Digest generation, topic naming
│   ├── scripts/                   # Seed data, migrations
│   └── tests/                     # Pytest (async fixtures, mock NLP)
├── frontend/
│   ├── src/
│   │   ├── lib/
│   │   │   ├── components/        # TopicCard, SentimentBar, PatchPulse, etc.
│   │   │   ├── stores/            # Svelte 5 rune stores ($state, $derived)
│   │   │   └── i18n/              # Internationalization (English MVP)
│   │   └── routes/                # SvelteKit pages (dashboard, digest, patch-pulse)
│   └── e2e/                       # Playwright E2E tests
├── database/
│   ├── ddl/                       # Schema definitions
│   └── dml/                       # Seed data, migrations
├── docker-compose.yml             # 6-service orchestration
└── docs/                          # Architecture diagrams

API Endpoints

Dashboard

Endpoint	Method	Description
`/api/dashboard/trending`	GET	Trending topics with sentiment
`/api/dashboard/topics`	GET	All topics list
`/api/dashboard/topics/{slug}`	GET	Single topic details
`/api/dashboard/sources`	GET	Source distribution
`/api/dashboard/patch-pulse`	GET	Current patch sentiment
`/api/dashboard/aggregate`	POST	Trigger aggregation
`/api/dashboard/digest/summary`	POST	AI digest summary

Feedback

Endpoint	Method	Description
`/api/feedback/vote`	POST	Submit vote (thumbs up/down)
`/api/feedback/report`	POST	Report inaccurate topic
`/api/feedback/general`	POST	Submit general feedback

Ingestion

Endpoint	Method	Description
`/api/ingestion/trigger`	POST	Trigger ingestion by platform
`/api/ingestion/status`	GET	All source statuses
`/api/ingestion/quota`	GET	YouTube API quota usage
`/api/ingestion/nlp-stats`	GET	NLP processing statistics
`/api/ingestion/nlp-sentiment`	POST	Trigger sentiment analysis

Health

Endpoint	Method	Description
`/api/health`	GET	Health check with DB/cache status

Key Technical Decisions

Decision	Optimized For	Trade-off
Isolated NLP worker container	API memory stability (~200MB vs ~2GB with models loaded)	Extra container orchestration complexity
Circuit breaker + DLQ	Graceful degradation when models fail	Eventual consistency — posts analyzed on retry, not immediately
BERTopic with seed topics	Consistent topic categories across runs	Less dynamic than fully unsupervised clustering
Valkey cache for aggregations	Sub-50ms dashboard loads on cached data	Stale reads between aggregation cycles (up to 6 hours)
Session-based anonymous feedback	Privacy-first — no user accounts required for MVP	Limited per-user analytics
APScheduler in-process	Zero additional infrastructure for scheduling	Single point of failure — moves to message queue in roadmap

Security

Environment-variable-only secrets (never committed, validated at startup)
SQLAlchemy ORM with parameterized queries (SQL injection prevention)
Pydantic validation on all request/response boundaries
CORS restricted to frontend origin
Structured error responses (no stack trace leakage)
Toxicity filtering before any content reaches the dashboard
Session IDs for anonymous feedback rate limiting

Vision

CommunityPulse started as a way to answer a simple question: what is the gaming community actually talking about right now? The answer required pulling data from fragmented sources, applying NLP at scale, and surfacing results through an intuitive dashboard.

The technical approach prioritizes reliability and observability. Every model call is wrapped in a circuit breaker. Failed analyses land in a dead letter queue with automatic retry. The NLP worker runs in isolation so a model crash never takes down the API. Ingestion adapters share a common interface, making it straightforward to add new data sources.

Looking ahead, the roadmap moves from scheduled polling to event-driven ingestion, adds WebSocket push for real-time updates, and introduces user accounts with customizable alert thresholds. The architecture is designed to scale horizontally — the NLP worker is the natural first candidate for pod autoscaling under load.

License

MIT License - See LICENSE file for details.

Top categories