An end-to-end data engineering pipeline that collects, processes, and analyzes football match results, standings data, weather data, Reddit data and summarizes matchdays using Gemini from the top 5 European leagues. Used data sources include football-data.org API, Open-Meteo API, and PRAW (Reddit API), Maps...
This project demonstrates a complete data pipeline for football (soccer) results, from data extraction to visualization. It implements some data engineering practices including data lakes, transformation layers, and Infrastructure as Code (IaC) with Terraform.
The pipeline follows the following architecture:
Category | Technologies |
---|---|
Cloud Platform | Google Cloud Platform (GCP) |
Infrastructure as Code | Terraform |
Programming Languages | Python, TypeScript (Svelte) |
Data Storage | Cloud Storage, BigQuery, Firestore |
Data Quality | Dataplex |
Data Transformation | Dataform |
Serverless Computing | Cloud Functions |
Event-Driven Architecture | Pub/Sub |
API Consumption | Football-data.org, Open-Meteo, Reddit API, Google Maps |
CI/CD | GitHub Actions |
Package Management | uv, pyproject.toml |
Code Quality | Ruff, Bandit, Mypy |
Testing | pytest |
Web Framework | Svelte, ShadCN UI Components |
Hosting | Firebase App Hosting |
LLM | Google Gemini 2.0 Flash |
soccer-tracker-DE-project/
āāā README.md
āāā .gitignore
āāā pyproject.toml
āāā Github/workflows/ # CI/CD in Github Actions
ā āāā cd.yml
ā āāā ci.yml
āāā terraform/ # IaC definitions
ā āāā main.tf
ā āāā variables.tf
ā āāā pubsub.tf
ā āāā cloud_functions.tf
āāā cloud_functions/
ā āāā league_data/ # League and Teams data extraction and load
ā āāā discord_utils/ # Package for sending Discord notifications using webhooks
ā āāā match_data/ # Match data extraction and load
ā āāā weather_data/ # Weather data extraction and load
ā āāā reddit_data/ # Reddit data extraction and load
ā āāā standings_data/ # Standings data extraction and load for each matchday
ā āāā data_validation/ # Data validation using Dataplex
ā āāā serving_layer/ # Load data to firestore
ā āāā generate_summaries/ # Generate match summaries with Gemini
āāā soccer_tracker_ui/ # Svelte web app in Firebase
ā āāā src/
ā ā āāā lib/ # Reusable components
ā ā ā āāā components/ # UI components from [shadcn](https://next.shadcn-svelte.com/)
ā ā ā āāā firebase.ts # Firebase/Firestore connection
ā ā ā āāā stores/ # Svelte stores for state management
ā ā āāā routes/ # Page components
ā āāā package.json # Dependencies and scripts
ā āāā svelte.config.js # Svelte configuration
ā āāā vite.config.js # Vite bundler config
āāā tests/ # Test suite for Cloud Functions with Pytest
The project includes a Svelte web app for visualizing match results, weather data, and match summaries.
App includes:
I got the idea to make this project from this repo by digitalghost-dev