phishing-detection Svelte Themes

Phishing Detection

Phishing detection phishing detection phishing detection

Phishing Detection

Open-source phishing detection engine for real-time URL analysis. Detect malicious links, explain every verdict, and generate a security report in real time.

⚡ Quick Start · ⚙️ Detection Engine · 🏛 Architecture · 📚 Docs · 🤝 Contributing


Phishing Detection Demo

Paste a URL → get a trust score, verdict, and detailed report in real time.

Live demo: https://safesurf.xorwave.com

Quick Start

git clone https://github.com/abhizaik/phishing-detection.git
cd phishing-detection
make build && make up

Open Web UI: localhost:3000

Detailed setup guide: docs/setup.md

At a Glance

  • Live scan, instant results
  • 18 analyzers, 33 signals, fully explainable
  • HTTP API + Web UI + Chrome extension
  • Explainable scoring (no black-box ML)
  • Simple Docker setup

How It Compares

Feature SafeSurf VirusTotal Google Safe Browsing URLScan.io CheckPhish
Live crawl, instant results Partial Partial Partial
Explains every verdict Partial Partial Partial
Beginner-friendly interface Partial Partial Partial Partial
Credential form detection Partial
Follows redirect chains
Detailed technical insights Partial
Live page preview
Detection using AI/ML Partial
Known phishing database coverage Partial Partial Partial
Scan multiple URLs at once
Browser protection
Open source

Fast scanners (like Google Safe Browsing) give you a verdict from database lookup with no explanation or live scanning. Deep crawlers (like URLScan.io) take too long. SafeSurf bridges the gap by doing live analysis with per-signal explanations in real time — and it's open-source.

Who This Is For

  • End users checking suspicious links
  • Developers integrating URL analysis
  • Security teams building detection pipelines
  • Researchers

API Example

Analyze a URL via HTTP:

curl "http://localhost:8080/api/v1/analyze?url=https://example.com"

Sample Response:


{
  "url": "https://example.com",
  "trust_score": 100,
  "verdict": "Safe",
  "reasons": {
    "good_reasons": [...]
  }
}

Full response schema → docs/api.md#example

Detection Engine

18 concurrent goroutines run across 7 signal categories, producing 33 individual signals. Every check emits a reason string — good, bad, or neutral — so the final score is always fully explainable. No black-box verdicts.

Score formula: finalScore = clamp(50 + (trustScore − riskScore) × 0.5)Risky < 30 · Suspicious 30–64 · Safe ≥ 65

50 is the neutral baseline — a URL with no signals scores exactly 50 (Suspicious), the right default for an unknown URL. Trust signals pull the score up, risk signals pull it down, each weighted at 0.5× so neither dominates alone. Both scores are individually clamped to 0–100 before the formula runs, preventing a single catastrophic signal from drowning all other context.

URL Signals (8 checks)

  1. Raw IP address as hostname (common evasion tactic)
  2. Punycode / IDN encoding (lookalike domain spoofing)
  3. URL shortener (hides the true destination)
  4. Excessive URL length (abnormally long URLs used to hide destination or confuse parsers)
  5. Excessive URL path depth (deeply nested paths used to obscure malicious endpoints)
  6. Phishing keywords in URL path (login, verify, secure, update…)
  7. Excessive subdomain count
  8. Non-ASCII Unicode characters in hostname (IDN homograph attack, e.g. аpple.com with Cyrillic а)

HTTP / Network (4 checks, single HTTP request)

  1. Redirect chain hop count
  2. Cross-domain redirect (final destination differs from source domain)
  3. HSTS support
  4. HTTP status code

DNS (3 checks)

  1. NS record validity
  2. MX record validity
  3. IP resolution

TLS / SSL (2 checks, single TLS handshake)

  1. TLS presence and hostname mismatch
  2. Certificate chain — validity, expiry, issuer, CT log status, known-bad fingerprints

Domain Intelligence (6 checks)

  1. Domain rank (position in top-1M global popularity list)
  2. TLD trust / risk / ICANN status
  3. Domain age via WHOIS (newly registered = high risk)
  4. DNSSEC (cryptographic DNS response integrity)
  5. Shannon entropy score (flags algorithmically generated domains)
  6. Typosquatting & combo-squatting across 500+ known brands

Content Analysis (8 checks)

  1. Login form on unranked or newly registered domain
  2. Payment form (credit card, CVV fields)
  3. Personal information form
  4. Hidden <iframe> (credential theft / clickjacking vector)
  5. Tracking pixels (1×1 hidden images)
  6. Brand name in page content vs. hosting domain
  7. Form submitting to an external domain
  8. Password field over unencrypted HTTP

Threat Intelligence (2 checks)

  1. PhishTank confirmed phishing (community-verified)
  2. PhishTank reported phishing (awaiting verification, 3 h cache)

Limitations

  • Heuristic-based detection may produce false positives
  • No ML model (intentional, prioritizes explainability and auditability)

Not a safety guarantee. Use alongside other defenses.

Architecture

Four containerized services on a shared Docker bridge network. The Go backend is the only service that makes outbound calls to external APIs — the frontend, Chrome, and cache are strictly internal.

Service Role
safesurf-web SvelteKit UI — :3000 (prod) · :5173 (dev)
safesurf-backend Go REST API & analyzer engine — :8080
safesurf-chrome Headless Chrome — WebSocket :9222
safesurf-valkey Valkey (Redis-compatible) — :6379, LRU cache, volume-persisted

Request lifecycle

  1. URL submitted via the UI or REST API
  2. Backend validates and normalizes the URL (scheme inferred if missing)
  3. Valkey cache checked — a hit returns the full result immediately, no re-analysis
  4. On miss: 18 goroutines launch concurrently via sync.WaitGroup; panics are recovered per-task without failing the request
  5. Results collected → score aggregated → verdict assigned
  6. Complete result cached in Valkey (24 h TTL) and logged to scan history
  7. Response returned — trust score, verdict, per-signal reasons, redirect chain, page screenshot, per-task timings
server/
  cmd/safesurf/         entry point
  internal/analyzer/    goroutine runner, task definitions, score aggregation
  internal/service/
    checks/             18 individual analyzer implementations
    screenshot/         headless Chrome integration
    cache/              Valkey client
    threatfeeds/        PhishTank client
    typosquat/          brand similarity engine
web/website/            SvelteKit UI
web/chrome-extension/   browser extension
docker/                 dev & prod Compose configs
docs/                   API, setup, architecture, security

Documentation

Citation

If you use this project in academic or research work, please cite it — see CITATION.cff.

License

Copyright (C) 2023–2026 Abhishek K P

SafeSurf is dual-licensed:

  • CommunityGNU Affero General Public License v3.0. Free to use, modify, and self-host. Any modified version run over a network must make its source code available to users.
  • Commercial — A separate commercial license is available for organizations that cannot comply with the AGPL-3.0 (e.g. closed-source SaaS, enterprise deployments). See COMMERCIAL.md or contact hi@abhizaik.com to enquire.

Contributing

If you found this project helpful, consider giving it a star.

Star History Chart

Top categories

Loading Svelte Themes