Phishing Detection

Phishing detection phishing detection phishing detection

#phishing #phishing-detection #link-scanner #url-analyzer #golang #svelte #safesurf

Phishing Detection

Open-source phishing detection engine for real-time URL analysis. Detect malicious links, explain every verdict, and generate a security report in real time.

⚡ Quick Start · ⚙️ Detection Engine · 🏛 Architecture · 📚 Docs · 🤝 Contributing

Phishing Detection Demo

Paste a URL → get a trust score, verdict, and detailed report in real time.

Live demo: https://safesurf.xorwave.com

Quick Start

git clone https://github.com/abhizaik/phishing-detection.git
cd phishing-detection
make build && make up

Open Web UI: localhost:3000

Detailed setup guide: docs/setup.md

At a Glance

Live scan, instant results
18 analyzers, 33 signals, fully explainable
HTTP API + Web UI + Chrome extension
Explainable scoring (no black-box ML)
Simple Docker setup

How It Compares

Feature	SafeSurf	VirusTotal	Google Safe Browsing	URLScan.io	CheckPhish
Live crawl, instant results	✅	Partial	❌	Partial	Partial
Explains every verdict	✅	Partial	❌	Partial	Partial
Beginner-friendly interface	✅	Partial	Partial	Partial	Partial
Credential form detection	✅	❌	❌	Partial	✅
Follows redirect chains	✅	✅	❌	✅	✅
Detailed technical insights	✅	❌	❌	✅	Partial
Live page preview	✅	❌	❌	✅	✅
Detection using AI/ML	❌	✅	✅	Partial	✅
Known phishing database coverage	Partial	✅	✅	Partial	Partial
Scan multiple URLs at once	❌	✅	✅	✅	❌
Browser protection	✅	✅	✅	✅	❌
Open source	✅	❌	❌	❌	❌

Fast scanners (like Google Safe Browsing) give you a verdict from database lookup with no explanation or live scanning. Deep crawlers (like URLScan.io) take too long. SafeSurf bridges the gap by doing live analysis with per-signal explanations in real time — and it's open-source.

Who This Is For

End users checking suspicious links
Developers integrating URL analysis
Security teams building detection pipelines
Researchers

API Example

Analyze a URL via HTTP:

curl "http://localhost:8080/api/v1/analyze?url=https://example.com"

Sample Response:


{
  "url": "https://example.com",
  "trust_score": 100,
  "verdict": "Safe",
  "reasons": {
    "good_reasons": [...]
  }
}

Full response schema → docs/api.md#example

Detection Engine

18 concurrent goroutines run across 7 signal categories, producing 33 individual signals. Every check emits a reason string — good, bad, or neutral — so the final score is always fully explainable. No black-box verdicts.

Score formula: finalScore = clamp(50 + (trustScore − riskScore) × 0.5) → Risky < 30 · Suspicious 30–64 · Safe ≥ 65

50 is the neutral baseline — a URL with no signals scores exactly 50 (Suspicious), the right default for an unknown URL. Trust signals pull the score up, risk signals pull it down, each weighted at 0.5× so neither dominates alone. Both scores are individually clamped to 0–100 before the formula runs, preventing a single catastrophic signal from drowning all other context.

URL Signals (8 checks)

Raw IP address as hostname (common evasion tactic)
Punycode / IDN encoding (lookalike domain spoofing)
URL shortener (hides the true destination)
Excessive URL length (abnormally long URLs used to hide destination or confuse parsers)
Excessive URL path depth (deeply nested paths used to obscure malicious endpoints)
Phishing keywords in URL path (login, verify, secure, update…)
Excessive subdomain count
Non-ASCII Unicode characters in hostname (IDN homograph attack, e.g. аpple.com with Cyrillic а)

HTTP / Network (4 checks, single HTTP request)

Redirect chain hop count
Cross-domain redirect (final destination differs from source domain)
HSTS support
HTTP status code

DNS (3 checks)

NS record validity
MX record validity
IP resolution

TLS / SSL (2 checks, single TLS handshake)

TLS presence and hostname mismatch
Certificate chain — validity, expiry, issuer, CT log status, known-bad fingerprints

Domain Intelligence (6 checks)

Domain rank (position in top-1M global popularity list)
TLD trust / risk / ICANN status
Domain age via WHOIS (newly registered = high risk)
DNSSEC (cryptographic DNS response integrity)
Shannon entropy score (flags algorithmically generated domains)
Typosquatting & combo-squatting across 500+ known brands

Content Analysis (8 checks)

Login form on unranked or newly registered domain
Payment form (credit card, CVV fields)
Personal information form
Hidden <iframe> (credential theft / clickjacking vector)
Tracking pixels (1×1 hidden images)
Brand name in page content vs. hosting domain
Form submitting to an external domain
Password field over unencrypted HTTP

Threat Intelligence (2 checks)

PhishTank confirmed phishing (community-verified)
PhishTank reported phishing (awaiting verification, 3 h cache)

Limitations

Heuristic-based detection may produce false positives
No ML model (intentional, prioritizes explainability and auditability)

Not a safety guarantee. Use alongside other defenses.

Architecture

Four containerized services on a shared Docker bridge network. The Go backend is the only service that makes outbound calls to external APIs — the frontend, Chrome, and cache are strictly internal.

Service	Role
`safesurf-web`	SvelteKit UI — :3000 (prod) · :5173 (dev)
`safesurf-backend`	Go REST API & analyzer engine — :8080
`safesurf-chrome`	Headless Chrome — WebSocket :9222
`safesurf-valkey`	Valkey (Redis-compatible) — :6379, LRU cache, volume-persisted

Request lifecycle

URL submitted via the UI or REST API
Backend validates and normalizes the URL (scheme inferred if missing)
Valkey cache checked — a hit returns the full result immediately, no re-analysis
On miss: 18 goroutines launch concurrently via sync.WaitGroup; panics are recovered per-task without failing the request
Results collected → score aggregated → verdict assigned
Complete result cached in Valkey (24 h TTL) and logged to scan history
Response returned — trust score, verdict, per-signal reasons, redirect chain, page screenshot, per-task timings

server/
  cmd/safesurf/         entry point
  internal/analyzer/    goroutine runner, task definitions, score aggregation
  internal/service/
    checks/             18 individual analyzer implementations
    screenshot/         headless Chrome integration
    cache/              Valkey client
    threatfeeds/        PhishTank client
    typosquat/          brand similarity engine
web/website/            SvelteKit UI
web/chrome-extension/   browser extension
docker/                 dev & prod Compose configs
docs/                   API, setup, architecture, security

Documentation

docs/README.md — start here
docs/api.md — full API reference
Swagger UI — interactive API docs

Citation

If you use this project in academic or research work, please cite it — see CITATION.cff.

License

SafeSurf is dual-licensed:

Community — GNU Affero General Public License v3.0. Free to use, modify, and self-host. Any modified version run over a network must make its source code available to users.
Commercial — A separate commercial license is available for organizations that cannot comply with the AGPL-3.0 (e.g. closed-source SaaS, enterprise deployments). See COMMERCIAL.md or contact hi@abhizaik.com to enquire.

Contributing

Found a bug? → Open an issue
Have a question or idea? → Start a discussion
Want to contribute code? → CONTRIBUTING.md

If you found this project helpful, consider giving it a star.

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing