Ml.shopanalytics

genaray

A minimalist Python & cloud ML project that trains on Amazon sales & review data to recommend optimal prices/discounts to boost ratings/sales and surface actionable visual insights. Powered end-to-end by AWS CloudFront, S3, ALB & Fargate and Svelte.

#ai #aws #aws-alb #aws-cloudfront #aws-ecs #aws-fargate #aws-s3 #cicd #devops #machine-learning

Download

Shop Analytics

A comprehensive e-commerce analytics platform that combines machine learning with modern web technologies and native cloud to provide predictive discounting insights and product recommendations.

Overview

Shop Analytics is a full-stack application that analyzes Amazon product data to provide intelligent insights for e-commerce optimization. The platform features:

Predictive Discounting: AI-powered recommendations for optimal discount percentages
Product Similarity: Machine learning-based product recommendations
Real-time Analytics: Interactive dashboard with live data visualization
Cloud-Native Architecture: Scalable AWS-based infrastructure

Technologies Included

Backend:

FastAPI - Modern Python web framework for building APIs
scikit-learn - Machine learning library for predictive models
pandas & numpy - Data manipulation and numerical computing
uvicorn - ASGI server for FastAPI

Frontend:

SvelteKit - Full-stack web framework
TypeScript - Type-safe JavaScript
Tailwind CSS - Utility-first CSS framework
shadcn/ui - Modern component library
TanStack Table - Powerful data table component
Chart.js - Interactive charts and visualizations

Infrastructure:

AWS ECS Fargate - Containerized application hosting
AWS S3 - Static file storage and hosting
AWS CloudFront - Global content delivery network
AWS ECR - Container image registry
Terraform - Infrastructure as Code
GitHub Actions - CI/CD pipeline

Prerequisites

Python 3.12+
Node.js 20+
AWS CLI (for deployment)
Terraform 1.5+ (for infrastructure)
Docker (for containerization)

Setup & Build

Clone

git clone https://github.com/your-username/ML.ShopAnalytics.git
cd ML.ShopAnalytics

Configuration

Backend Configuration

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
   
# Install dependencies
pip install -r requirements.txt

Frontend Configuration
```
cd frontend
npm install
```

AWS Configuration (for deployment)

aws configure
# Enter your AWS Access Key ID, Secret Access Key, and region

Build

Train ML Models

# Preprocess data
python src/data/preprocess.py
   
# Train predictive discounting model
python src/predictive_discounting/predictive_discounting.py

# Train similarity recommendation model
python src/similarity_recommendation/similarity_recommendation.py

Build Frontend
```
cd frontend
npm run build
```

Run

Local Development

Backend

# From project root
python run_api.py
# Or with uvicorn directly
uvicorn src.app:app --reload --host 0.0.0.0 --port 8000

Frontend
```
cd frontend
npm run dev
```
Access the application
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs

Via CI/CD

The application is automatically deployed to AWS when changes are pushed to the main branch. The GitHub Actions workflow:

Trains ML Models - Preprocesses data and trains predictive models
Builds & Pushes Images - Creates Docker images and pushes to ECR
Deploys Infrastructure - Uses Terraform to manage AWS resources
Updates Services - Deploys new versions to ECS Fargate

Endpoints

Health Check

GET /health - Application health status

Products

GET /api/v1/products - List products with pagination and search
GET /api/v1/products/{product_id} - Get specific product details

Predictive Discounting

POST /api/v1/predictive-discounting/predict-discount - Get discount recommendations

Request Body:

{
  "product_category": "Electronics",
  "product_price_actual": 299.99,
  "product_rating_avg": 4.5,
  "product_description": "High-quality wireless headphones"
}

Response:

{
  "best_discount_pct": 0.15,
  "best_predicted_rating_count": 1250,
  "confidence_score": 0.87
}

Similarity Recommendations

POST /api/v1/similarity/find-similar - Find similar products

Request Body:

{
  "product_name": "Wireless Headphones",
  "product_category": "Electronics",
  "product_price_actual": 299.99,
  "product_discount_pct": 0.1,
  "product_rating_avg": 4.5,
  "product_rating_count": 1200,
  "product_description": "Premium wireless headphones",
  "n_recommendations": 5
}

ML

Technologies Used

scikit-learn - Primary ML framework
Random Forest Regressor - For predictive discounting
TF-IDF Vectorization - Text feature extraction
Custom Transformers - Feature engineering and preprocessing
Joblib - Model serialization and caching

Predictive Discounting Model

The predictive discounting system uses a machine learning pipeline that:

Feature Engineering
- Text processing of product descriptions using TF-IDF
- Category encoding with OneHotEncoder
- Price and rating normalization
- Custom transformers for category splitting and weight scaling
Model Architecture
- Random Forest Regressor for robust predictions
- Pipeline-based approach for consistent preprocessing
- KNearest-Neighbour with MultilabelBinarizer for similarity recommendations
Training Process
- Uses historical Amazon product data
- Predicts optimal discount percentages based on product characteristics
- Estimates expected rating count improvements

Similarity Recommendation Model

The similarity system provides product recommendations by:

Feature Extraction
- Multi-label binarization for categories
- Text similarity using TF-IDF
- Numerical feature scaling
Similarity Calculation
- Cosine similarity for text features
- Euclidean distance for numerical features
- Weighted combination of multiple similarity metrics
Recommendation Engine
- Finds products with similar characteristics
- Ranks by similarity score
- Returns top N recommendations

Architecture

AWS Infrastructure

The application is deployed on AWS using a modern, scalable architecture:

Compute Layer

ECS Fargate - Serverless container orchestration
Application Load Balancer - Traffic distribution and SSL termination
Auto Scaling - Automatic scaling based on demand

Storage Layer

S3 - Static frontend hosting and data storage
ECR - Container image registry
CloudWatch Logs - Centralized logging

Network Layer

CloudFront - Global CDN for frontend and API
Route 53 - DNS management
VPC - Network isolation and security

Security

IAM Roles - Least privilege access control
Security Groups - Network-level security
WAF - Web application firewall (optional)

Application Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   API Gateway   │    │   Backend       │
│   (SvelteKit)   │◄──►│   (CloudFront)  │◄──►│   (FastAPI)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   S3 Bucket     │    │   ALB           │    │   ML Models     │
│   (Static Host) │    │   (Load Bal.)   │    │   (Joblib)      │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Data Flow

User Request → CloudFront → ALB → ECS Fargate
API Processing → FastAPI → ML Models → Response
Static Assets → S3 → CloudFront → User

Outlook & Improvements

Possible Enhancements

Advanced ML Features
- Real-time model retraining with new data
- A/B testing framework for discount strategies
- Personalized recommendations based on user behavior
- Time-series analysis for seasonal trends
Performance Optimizations
- Redis caching for frequently accessed data
- Database integration (PostgreSQL/RDS)
- GraphQL API for more efficient data fetching
- CDN optimization for global performance
User Experience
- Real-time notifications for price changes
- Advanced filtering and sorting options
- Export functionality for reports
- Mobile-responsive design improvements
Infrastructure Enhancements
- Multi-region deployment for better latency
- Blue-green deployment strategy
- Enhanced monitoring and alerting
- Cost optimization and resource management
- Sagemaker for ML Training
Analytics & Reporting
- Advanced dashboard with more metrics
- Custom report generation
- Data visualization improvements
- Integration with external analytics tools

Technical Debt

Implement comprehensive unit and integration tests
Add API rate limiting and authentication
Improve error handling and logging
Optimize ML model performance and accuracy
Enhance security measures and compliance

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support and questions, please open an issue in the GitHub repository or contact the development team.

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing