Shop Analytics
A comprehensive e-commerce analytics platform that combines machine learning with modern web technologies and native cloud to provide predictive discounting insights and product recommendations.
Overview
Shop Analytics is a full-stack application that analyzes Amazon product data to provide intelligent insights for e-commerce optimization. The platform features:
- Predictive Discounting: AI-powered recommendations for optimal discount percentages
- Product Similarity: Machine learning-based product recommendations
- Real-time Analytics: Interactive dashboard with live data visualization
- Cloud-Native Architecture: Scalable AWS-based infrastructure
Technologies Included
Backend:
- FastAPI - Modern Python web framework for building APIs
- scikit-learn - Machine learning library for predictive models
- pandas & numpy - Data manipulation and numerical computing
- uvicorn - ASGI server for FastAPI
Frontend:
- SvelteKit - Full-stack web framework
- TypeScript - Type-safe JavaScript
- Tailwind CSS - Utility-first CSS framework
- shadcn/ui - Modern component library
- TanStack Table - Powerful data table component
- Chart.js - Interactive charts and visualizations
Infrastructure:
- AWS ECS Fargate - Containerized application hosting
- AWS S3 - Static file storage and hosting
- AWS CloudFront - Global content delivery network
- AWS ECR - Container image registry
- Terraform - Infrastructure as Code
- GitHub Actions - CI/CD pipeline
Prerequisites
- Python 3.12+
- Node.js 20+
- AWS CLI (for deployment)
- Terraform 1.5+ (for infrastructure)
- Docker (for containerization)
Setup & Build
Clone
git clone https://github.com/your-username/ML.ShopAnalytics.git
cd ML.ShopAnalytics
Configuration
Backend Configuration
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
Frontend Configuration
cd frontend
npm install
AWS Configuration (for deployment)
aws configure
# Enter your AWS Access Key ID, Secret Access Key, and region
Build
Train ML Models
# Preprocess data
python src/data/preprocess.py
# Train predictive discounting model
python src/predictive_discounting/predictive_discounting.py
# Train similarity recommendation model
python src/similarity_recommendation/similarity_recommendation.py
Build Frontend
cd frontend
npm run build
Run
Local Development
Backend
# From project root
python run_api.py
# Or with uvicorn directly
uvicorn src.app:app --reload --host 0.0.0.0 --port 8000
Frontend
cd frontend
npm run dev
Access the application
Via CI/CD
The application is automatically deployed to AWS when changes are pushed to the main
branch. The GitHub Actions workflow:
- Trains ML Models - Preprocesses data and trains predictive models
- Builds & Pushes Images - Creates Docker images and pushes to ECR
- Deploys Infrastructure - Uses Terraform to manage AWS resources
- Updates Services - Deploys new versions to ECS Fargate
Endpoints
Health Check
GET /health
- Application health status
Products
GET /api/v1/products
- List products with pagination and search
GET /api/v1/products/{product_id}
- Get specific product details
Predictive Discounting
POST /api/v1/predictive-discounting/predict-discount
- Get discount recommendations
Request Body:
{
"product_category": "Electronics",
"product_price_actual": 299.99,
"product_rating_avg": 4.5,
"product_description": "High-quality wireless headphones"
}
Response:
{
"best_discount_pct": 0.15,
"best_predicted_rating_count": 1250,
"confidence_score": 0.87
}
Similarity Recommendations
POST /api/v1/similarity/find-similar
- Find similar products
Request Body:
{
"product_name": "Wireless Headphones",
"product_category": "Electronics",
"product_price_actual": 299.99,
"product_discount_pct": 0.1,
"product_rating_avg": 4.5,
"product_rating_count": 1200,
"product_description": "Premium wireless headphones",
"n_recommendations": 5
}
ML
Technologies Used
- scikit-learn - Primary ML framework
- Random Forest Regressor - For predictive discounting
- TF-IDF Vectorization - Text feature extraction
- Custom Transformers - Feature engineering and preprocessing
- Joblib - Model serialization and caching
Predictive Discounting Model
The predictive discounting system uses a machine learning pipeline that:
Feature Engineering
- Text processing of product descriptions using TF-IDF
- Category encoding with OneHotEncoder
- Price and rating normalization
- Custom transformers for category splitting and weight scaling
Model Architecture
- Random Forest Regressor for robust predictions
- Pipeline-based approach for consistent preprocessing
- KNearest-Neighbour with MultilabelBinarizer for similarity recommendations
Training Process
- Uses historical Amazon product data
- Predicts optimal discount percentages based on product characteristics
- Estimates expected rating count improvements
Similarity Recommendation Model
The similarity system provides product recommendations by:
Feature Extraction
- Multi-label binarization for categories
- Text similarity using TF-IDF
- Numerical feature scaling
Similarity Calculation
- Cosine similarity for text features
- Euclidean distance for numerical features
- Weighted combination of multiple similarity metrics
Recommendation Engine
- Finds products with similar characteristics
- Ranks by similarity score
- Returns top N recommendations
Architecture
AWS Infrastructure
The application is deployed on AWS using a modern, scalable architecture:
Compute Layer
- ECS Fargate - Serverless container orchestration
- Application Load Balancer - Traffic distribution and SSL termination
- Auto Scaling - Automatic scaling based on demand
Storage Layer
- S3 - Static frontend hosting and data storage
- ECR - Container image registry
- CloudWatch Logs - Centralized logging
Network Layer
- CloudFront - Global CDN for frontend and API
- Route 53 - DNS management
- VPC - Network isolation and security
Security
- IAM Roles - Least privilege access control
- Security Groups - Network-level security
- WAF - Web application firewall (optional)
Application Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ API Gateway │ │ Backend │
│ (SvelteKit) │◄──►│ (CloudFront) │◄──►│ (FastAPI) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ S3 Bucket │ │ ALB │ │ ML Models │
│ (Static Host) │ │ (Load Bal.) │ │ (Joblib) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Data Flow
- User Request → CloudFront → ALB → ECS Fargate
- API Processing → FastAPI → ML Models → Response
- Static Assets → S3 → CloudFront → User
Outlook & Improvements
Possible Enhancements
Advanced ML Features
- Real-time model retraining with new data
- A/B testing framework for discount strategies
- Personalized recommendations based on user behavior
- Time-series analysis for seasonal trends
Performance Optimizations
- Redis caching for frequently accessed data
- Database integration (PostgreSQL/RDS)
- GraphQL API for more efficient data fetching
- CDN optimization for global performance
User Experience
- Real-time notifications for price changes
- Advanced filtering and sorting options
- Export functionality for reports
- Mobile-responsive design improvements
Infrastructure Enhancements
- Multi-region deployment for better latency
- Blue-green deployment strategy
- Enhanced monitoring and alerting
- Cost optimization and resource management
- Sagemaker for ML Training
Analytics & Reporting
- Advanced dashboard with more metrics
- Custom report generation
- Data visualization improvements
- Integration with external analytics tools
Technical Debt
- Implement comprehensive unit and integration tests
- Add API rate limiting and authentication
- Improve error handling and logging
- Optimize ML model performance and accuracy
- Enhance security measures and compliance
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
For support and questions, please open an issue in the GitHub repository or contact the development team.