soccer-tracker-DE-project Svelte Themes

Soccer Tracker De Project

End-To-End Data Engineering Project. Made to learn some common data engineering practices.

Football Statistics Tracker Logo

āš ļø LEARNING PROJECT: This is a personal learning project in the field of data engineering. I understand the architecture might not be the most optimal as this project. I made this to practise and learn. Feedback and suggestions are highly welcomed!

Football Statistics Tracker šŸ“Šāš½

An end-to-end data engineering pipeline that collects, processes, and analyzes football match results, standings data, weather data, Reddit data and summarizes matchdays using Gemini from the top 5 European leagues. Used data sources include football-data.org API, Open-Meteo API, and PRAW (Reddit API), Maps...

Introduction

This project demonstrates a complete data pipeline for football (soccer) results, from data extraction to visualization. It implements some data engineering practices including data lakes, transformation layers, and Infrastructure as Code (IaC) with Terraform.

Features

  • Automated Data Collection: Scheduled data fetching from multiple APIs using Google Cloud Functions
  • Multi-layer Data Architecture: Raw data stored in GCS, processed data in BigQuery, and user-facing data in Firestore
  • Weather Integration: Match statistics with weather data at match time
  • Social Media (Reddit) Data: Reddit comments for fan sentiment
  • Infrastructure as Code: Cloud Functions and Pub/Sub subscriptions and topics defined and deployed with Terraform

Architecture

The pipeline follows the following architecture:

  1. Data Ingestion: Cloud Functions trigger on schedule to fetch data
  2. Storage Layers: Raw data(json) ā†’ External BQ tables (Parquet) ā†’ Processed Data in BQ ā†’ Firestore
  3. Validation: Very simple validation and Data qaulity with Dataplex
  4. Summarization: Creation of short summaries in Markdown with Gemini 2.0 Flash
  5. Visualization: Web app for insights

Data Sources

  • Football-data.org: Match data, team data, and standings
  • Open-Meteo API: Historical weather data
  • Reddit (via PRAW): Fan comments and sentiment
  • Maps SDK: Location of stadiums

Technology Stack

Category Technologies
Cloud Platform Google Cloud Platform (GCP)
Infrastructure as Code Terraform
Programming Languages Python, TypeScript (Svelte)
Data Storage Cloud Storage, BigQuery, Firestore
Data Quality Dataplex
Data Transformation Dataform
Serverless Computing Cloud Functions
Event-Driven Architecture Pub/Sub
API Consumption Football-data.org, Open-Meteo, Reddit API, Google Maps
CI/CD GitHub Actions
Package Management uv, pyproject.toml
Code Quality Ruff, Bandit, Mypy
Testing pytest
Web Framework Svelte, ShadCN UI Components
Hosting Firebase App Hosting
LLM Google Gemini 2.0 Flash

Project Structure

soccer-tracker-DE-project/
ā”œā”€ā”€ README.md
ā”œā”€ā”€ .gitignore
ā”œā”€ā”€ pyproject.toml
ā”œā”€ā”€ Github/workflows/                  # CI/CD in Github Actions
ā”‚   ā”œā”€ā”€ cd.yml
ā”‚   ā””ā”€ā”€ ci.yml
ā”œā”€ā”€ terraform/                         # IaC definitions
ā”‚   ā”œā”€ā”€ main.tf
ā”‚   ā”œā”€ā”€ variables.tf
ā”‚   ā”œā”€ā”€ pubsub.tf
ā”‚   ā””ā”€ā”€ cloud_functions.tf
ā”œā”€ā”€ cloud_functions/
ā”‚   ā”œā”€ā”€ league_data/                   # League and Teams data extraction and load
ā”‚   ā”œā”€ā”€ discord_utils/                 # Package for sending Discord notifications using webhooks
ā”‚   ā”œā”€ā”€ match_data/                    # Match data extraction and load
ā”‚   ā”œā”€ā”€ weather_data/                  # Weather data extraction and load
ā”‚   ā”œā”€ā”€ reddit_data/                   # Reddit data extraction and load
ā”‚   ā”œā”€ā”€ standings_data/                # Standings data extraction and load for each matchday
ā”‚   ā”œā”€ā”€ data_validation/               # Data validation using Dataplex
ā”‚   ā”œā”€ā”€ serving_layer/                 # Load data to firestore
ā”‚   ā””ā”€ā”€ generate_summaries/            # Generate match summaries with Gemini
ā”œā”€ā”€ soccer_tracker_ui/                 # Svelte web app in Firebase
ā”‚   ā”œā”€ā”€ src/
ā”‚   ā”‚   ā”œā”€ā”€ lib/                       # Reusable components
ā”‚   ā”‚   ā”‚   ā”œā”€ā”€ components/            # UI components from [shadcn](https://next.shadcn-svelte.com/)
ā”‚   ā”‚   ā”‚   ā”œā”€ā”€ firebase.ts            # Firebase/Firestore connection
ā”‚   ā”‚   ā”‚   ā””ā”€ā”€ stores/                # Svelte stores for state management
ā”‚   ā”‚   ā”œā”€ā”€ routes/                    # Page components
ā”‚   ā”œā”€ā”€ package.json                   # Dependencies and scripts
ā”‚   ā”œā”€ā”€ svelte.config.js               # Svelte configuration
ā”‚   ā”œā”€ā”€ vite.config.js                 # Vite bundler config
ā””ā”€ā”€ tests/                             # Test suite for Cloud Functions with Pytest

Additional Documentation

Web app

The project includes a Svelte web app for visualizing match results, weather data, and match summaries.

App includes:

  • Match Results
  • Match summaries using an LLM (Gemini 2.0 Flash)
  • Weather data during matches
  • Comments from Reddit

āš ļø DISCLAIMER: I know this data probably does not have much real value as it is not real-time and the statistics are not that deep ( I wanted to stay within free tiers of APIs).

I got the idea to make this project from this repo by digitalghost-dev

Top categories

Loading Svelte Themes