ML.ShopAnalytics Svelte Themes

Ml.shopanalytics

A minimalist Python & cloud ML project that trains on Amazon sales & review data to recommend optimal prices/discounts to boost ratings/sales and surface actionable visual insights. Powered end-to-end by AWS CloudFront, S3, ALB & Fargate and Svelte.

Shop Analytics

A comprehensive e-commerce analytics platform that combines machine learning with modern web technologies and native cloud to provide predictive discounting insights and product recommendations.

Overview

Shop Analytics is a full-stack application that analyzes Amazon product data to provide intelligent insights for e-commerce optimization. The platform features:

  • Predictive Discounting: AI-powered recommendations for optimal discount percentages
  • Product Similarity: Machine learning-based product recommendations
  • Real-time Analytics: Interactive dashboard with live data visualization
  • Cloud-Native Architecture: Scalable AWS-based infrastructure

Technologies Included

Backend:

  • FastAPI - Modern Python web framework for building APIs
  • scikit-learn - Machine learning library for predictive models
  • pandas & numpy - Data manipulation and numerical computing
  • uvicorn - ASGI server for FastAPI

Frontend:

  • SvelteKit - Full-stack web framework
  • TypeScript - Type-safe JavaScript
  • Tailwind CSS - Utility-first CSS framework
  • shadcn/ui - Modern component library
  • TanStack Table - Powerful data table component
  • Chart.js - Interactive charts and visualizations

Infrastructure:

  • AWS ECS Fargate - Containerized application hosting
  • AWS S3 - Static file storage and hosting
  • AWS CloudFront - Global content delivery network
  • AWS ECR - Container image registry
  • Terraform - Infrastructure as Code
  • GitHub Actions - CI/CD pipeline

Prerequisites

  • Python 3.12+
  • Node.js 20+
  • AWS CLI (for deployment)
  • Terraform 1.5+ (for infrastructure)
  • Docker (for containerization)

Setup & Build

Clone

git clone https://github.com/your-username/ML.ShopAnalytics.git
cd ML.ShopAnalytics

Configuration

  1. Backend Configuration

    # Create virtual environment
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
       
    # Install dependencies
    pip install -r requirements.txt
    
  2. Frontend Configuration

    cd frontend
    npm install
    
  3. AWS Configuration (for deployment)

    aws configure
    # Enter your AWS Access Key ID, Secret Access Key, and region
    

Build

  1. Train ML Models

    # Preprocess data
    python src/data/preprocess.py
       
    # Train predictive discounting model
    python src/predictive_discounting/predictive_discounting.py
    
    # Train similarity recommendation model
    python src/similarity_recommendation/similarity_recommendation.py
    
  2. Build Frontend

    cd frontend
    npm run build
    

Run

Local Development

  1. Backend

    # From project root
    python run_api.py
    # Or with uvicorn directly
    uvicorn src.app:app --reload --host 0.0.0.0 --port 8000
    
  2. Frontend

    cd frontend
    npm run dev
    
  3. Access the application

Via CI/CD

The application is automatically deployed to AWS when changes are pushed to the main branch. The GitHub Actions workflow:

  1. Trains ML Models - Preprocesses data and trains predictive models
  2. Builds & Pushes Images - Creates Docker images and pushes to ECR
  3. Deploys Infrastructure - Uses Terraform to manage AWS resources
  4. Updates Services - Deploys new versions to ECS Fargate

Endpoints

Health Check

  • GET /health - Application health status

Products

  • GET /api/v1/products - List products with pagination and search
  • GET /api/v1/products/{product_id} - Get specific product details

Predictive Discounting

  • POST /api/v1/predictive-discounting/predict-discount - Get discount recommendations

Request Body:

{
  "product_category": "Electronics",
  "product_price_actual": 299.99,
  "product_rating_avg": 4.5,
  "product_description": "High-quality wireless headphones"
}

Response:

{
  "best_discount_pct": 0.15,
  "best_predicted_rating_count": 1250,
  "confidence_score": 0.87
}

Similarity Recommendations

  • POST /api/v1/similarity/find-similar - Find similar products

Request Body:

{
  "product_name": "Wireless Headphones",
  "product_category": "Electronics",
  "product_price_actual": 299.99,
  "product_discount_pct": 0.1,
  "product_rating_avg": 4.5,
  "product_rating_count": 1200,
  "product_description": "Premium wireless headphones",
  "n_recommendations": 5
}

ML

Technologies Used

  • scikit-learn - Primary ML framework
  • Random Forest Regressor - For predictive discounting
  • TF-IDF Vectorization - Text feature extraction
  • Custom Transformers - Feature engineering and preprocessing
  • Joblib - Model serialization and caching

Predictive Discounting Model

The predictive discounting system uses a machine learning pipeline that:

  1. Feature Engineering

    • Text processing of product descriptions using TF-IDF
    • Category encoding with OneHotEncoder
    • Price and rating normalization
    • Custom transformers for category splitting and weight scaling
  2. Model Architecture

    • Random Forest Regressor for robust predictions
    • Pipeline-based approach for consistent preprocessing
    • KNearest-Neighbour with MultilabelBinarizer for similarity recommendations
  3. Training Process

    • Uses historical Amazon product data
    • Predicts optimal discount percentages based on product characteristics
    • Estimates expected rating count improvements

Similarity Recommendation Model

The similarity system provides product recommendations by:

  1. Feature Extraction

    • Multi-label binarization for categories
    • Text similarity using TF-IDF
    • Numerical feature scaling
  2. Similarity Calculation

    • Cosine similarity for text features
    • Euclidean distance for numerical features
    • Weighted combination of multiple similarity metrics
  3. Recommendation Engine

    • Finds products with similar characteristics
    • Ranks by similarity score
    • Returns top N recommendations

Architecture

AWS Infrastructure

The application is deployed on AWS using a modern, scalable architecture:

Compute Layer

  • ECS Fargate - Serverless container orchestration
  • Application Load Balancer - Traffic distribution and SSL termination
  • Auto Scaling - Automatic scaling based on demand

Storage Layer

  • S3 - Static frontend hosting and data storage
  • ECR - Container image registry
  • CloudWatch Logs - Centralized logging

Network Layer

  • CloudFront - Global CDN for frontend and API
  • Route 53 - DNS management
  • VPC - Network isolation and security

Security

  • IAM Roles - Least privilege access control
  • Security Groups - Network-level security
  • WAF - Web application firewall (optional)

Application Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   API Gateway   │    │   Backend       │
│   (SvelteKit)   │◄──►│   (CloudFront)  │◄──►│   (FastAPI)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   S3 Bucket     │    │   ALB           │    │   ML Models     │
│   (Static Host) │    │   (Load Bal.)   │    │   (Joblib)      │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Data Flow

  1. User Request → CloudFront → ALB → ECS Fargate
  2. API Processing → FastAPI → ML Models → Response
  3. Static Assets → S3 → CloudFront → User

Outlook & Improvements

Possible Enhancements

  1. Advanced ML Features

    • Real-time model retraining with new data
    • A/B testing framework for discount strategies
    • Personalized recommendations based on user behavior
    • Time-series analysis for seasonal trends
  2. Performance Optimizations

    • Redis caching for frequently accessed data
    • Database integration (PostgreSQL/RDS)
    • GraphQL API for more efficient data fetching
    • CDN optimization for global performance
  3. User Experience

    • Real-time notifications for price changes
    • Advanced filtering and sorting options
    • Export functionality for reports
    • Mobile-responsive design improvements
  4. Infrastructure Enhancements

    • Multi-region deployment for better latency
    • Blue-green deployment strategy
    • Enhanced monitoring and alerting
    • Cost optimization and resource management
    • Sagemaker for ML Training
  5. Analytics & Reporting

    • Advanced dashboard with more metrics
    • Custom report generation
    • Data visualization improvements
    • Integration with external analytics tools

Technical Debt

  • Implement comprehensive unit and integration tests
  • Add API rate limiting and authentication
  • Improve error handling and logging
  • Optimize ML model performance and accuracy
  • Enhance security measures and compliance

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support and questions, please open an issue in the GitHub repository or contact the development team.

Top categories

Loading Svelte Themes