marketing-auditor
POC: marketing claim + dark-pattern audit for commercial web pages.
What is built today: a CLI + local web UI that fetches any product / pricing / landing page, extracts its claims, classifies them by verifiability (verifiable / potentially_verifiable / unverifiable) against cited standards, detects rule-based dark patterns, and aggregates findings across vendors and verticals. Runs locally; data stays on the machine.
What is not built yet: buyer-side autonomous agents, attestation infrastructure, published verifiable-claim schemas, browser extension, hosted service. The longer essay below ("Fair Marketing — Problem Statement & Technical Directions") is the thesis and roadmap; the rest of this file documents it.
The repo name is marketing-auditor. The UI and CLI brand is auditor. This is a working title for a POC, not a product name.
Quick start
# Install dependencies
bun install
# Single-page audit (no LLM)
bun run cli fetch https://example.com
# Full audit incl. LLM claim extraction (needs OPENAI_API_KEY)
export OPENAI_API_KEY=sk-...
bun run cli fetch --extract https://example.com
# Batch eval over a URL list
bun run cli eval data/seeds/saas-pricing.txt
bun run cli detect # rule-based dark-pattern backfill over cached pages
bun run cli report # aggregate all results
# Browse findings in the web UI
bun run --cwd apps/app dev
# → http://localhost:5173
Known limitations of this POC
- No tests. Nothing is unit-tested; run behavior has been validated manually and via the eval harness only.
- LLM extraction variance (~10% at the claim level) — same page re-extracted produces slightly different classifications. Not a bug; a consequence of non-deterministic generation.
- Playwright render fallback is broken on some Windows setups (CDP handshake times out). The fetcher degrades gracefully to static HTML; pages that need JS rendering (iHerb in particular) produce thin extractions.
- CLI exit codes exist (1 / 2 / 3 / 4 for various failure modes) but are not yet documented.
- Pagination not implemented in the browse UI. Degrades past ~500 rows.
- SSRF:
/api/audit blocks loopback/private-IP hosts, but this is a POC — don't expose the dev server to the internet.
- Security posture: assumes local-only use. The
/api/extractions endpoint exposes all stored data with no auth; fine for localhost, don't deploy without adding auth.
- Schema drift between bun:sqlite (CLI writes) and better-sqlite3 (Node reads): both libraries read the same SQLite file but schema changes must be replicated in both layers (
packages/core/src/storage.ts and apps/app/src/lib/server/db.ts).
Findings so far
Small empirical corpus. All absolute numbers are from ~N=56 commercial pages + 3 Wikipedia controls, audited with GPT-4.1. Treat as illustrative, not publishable.
Headline: 0.7% of commercial claims are verifiable (4 / ~750) — all four are on a single page (Swanson's Deep Rest supplement, which cites Gersappe V et al. Int J Basic Clin Pharmacol. 2024;13(4):475-485). Every other commercial page in the corpus is 0% verifiable.
- 56% of commercial claims are potentially verifiable — specific enough to test (numeric quantities, named certifications, concrete thresholds) but the page cites no independent source. The raw material for verification exists; the attestation infrastructure doesn't.
- 43% are unverifiable — vague superlatives, undefined terms, emotional appeals.
Per-vertical % unverifiable (higher = more marketing-heavy):
- control (Wikipedia): 0%
- SaaS pricing (15 pages): 39%
- supplements (19 pages): 45%
- consumer electronics (18 pages): 49% — counterintuitively the dirtiest, driven by lifestyle-audio brand copy
Within-vertical spread is larger than between-vertical spread. SaaS pricing pages ranged from 0% unverifiable (Render) to 100% (GitHub, Atlassian). Consumer electronics from 23% (Ember) to 63% (one Keychron keyboard). Vendor-level scoring is the more useful lens than category-level averages.
Dark patterns: 22 rule-detectable patterns across 18/56 pages (~32%). Dominant types: autoplay media (attention_manipulation) and strikethrough pricing (price_obfuscation). False-scarcity ("only N left") surfaced on zero pages in this corpus — modern sites either don't use it or JS-render it out of reach of static fetch.
Economics: ~$0.015 per page with GPT-4.1. 1000 pages ≈ $15.
Repository layout
marketing-auditor/
packages/core/ TypeScript library: models, fetcher, sanitizer,
extractor (OpenAI), dark-patterns, storage (bun:sqlite)
apps/cli/ Bun CLI: fetch, eval, detect, report, discover
apps/app/ SvelteKit 2 + Svelte 5 + Tailwind v4 web UI
data/seeds/ Starter URL lists per vertical
archive/ Earlier Python prototype (gitignored)
Development
bun run --cwd apps/app check # svelte-check + biome + prettier + vite build
bun run --cwd apps/app check:fix # auto-fix Biome and Prettier
Fair Marketing — Problem Statement & Technical Directions
1. Problem Statement
Modern marketing has drifted into a state where it is:
- Intrusive — interrupts attention the consumer did not offer
- Misleading — optimizes for persuasion rather than informed choice
- Expensive — a significant share of consumer prices pays for the marketing apparatus itself (adtech, agencies, sponsorships, SEO arms races, attention auctions)
- Asymmetric — sellers know more than buyers and spend money to keep it that way
For many categories (cosmetics, supplements, SaaS, DTC brands), marketing plus distribution margin exceeds production cost. Consumers pay for a war of attention they did not enlist in.
Goal: maintain the legitimate functions of marketing (discovery, claim communication, trust-building) while making the experience feel invisible, making claims verifiable, and drastically reducing cost — with the savings flowing to the consumer.
2. Core Thesis
The persuasion-heavy layer of marketing exists because of an information and leverage asymmetry between sellers and buyers. Close that asymmetry and most of the cost structure, intrusiveness, and manipulation collapse as economic side-effects — not because of ethics, but because the tactics stop working.
The closing mechanism: capable buyer-side agents + machine-verifiable seller claims + portable reputation + attention priced to the receiver.
3. What AI and Adjacent Tech Change
Four shifts look real, not hype:
- Buyer-side agents flip the asymmetry. An agent that negotiates, filters, and compares on the user's behalf makes ads aimed at the user wasted spend. Sellers must instead market to agents — meaning structured, verifiable claims rather than FOMO, vibes, and dark patterns.
- Discovery becomes pull, not push. Sellers publish machine-readable specs and reputational history; buyer agents query them. Spend shifts from interrupting to being findable and verifiable — much cheaper.
- Reputation becomes portable and cryptographically checkable. Warranty claims, defect rates, return rates, real usage data — published or provable via ZK — make brand trust earned and cheap to verify rather than bought via advertising.
- Personalization without surveillance. On-device models personalize without data ever leaving the user. The ~$600B global adtech tracking industry becomes largely obsolete.
4. Technical Architecture
Five layers. The split between centralized, federated, and sovereign is deliberate — see §6.
4.1 Claim Layer — the keystone
A machine-readable schema where every product claim is typed and either verifiable or explicitly marked as unverified marketing. Extends schema.org with:
- Numeric claims with units, measurement methodology, and a signed attestation hash
e.g.
battery_life: {value: 10, unit: hours, test: "IEC 61960 @ 25°C", source: <attestation>}
- Categorical claims with controlled vocabularies — no free-form "eco-friendly"; either a given certification applies or it does not
- Probabilistic claims with confidence intervals and sample size
- Every claim signed by the seller; verified claims co-signed by an issuer (lab, auditor, regulator, or buyer-agent consensus)
Without this layer, everything else is vibes.
4.2 Attestation / Trust Layer
Multiple mechanisms, layered by cost and trust:
- Self-attested + bonded — seller posts collateral, auto-slashed on falsification. Cheap; works for low-stakes claims.
- Third-party attested — labs, certifiers, auditors sign claims. Expensive; high trust.
- Crowd-attested — buyer agents report outcomes post-purchase, forming distributed ground truth. Medium trust; scales well.
- ZK-attested — prove "returns rate < 3%" without exposing underlying data. Useful when data is competitive.
A reputation graph covers issuers too, so captured certifiers get discounted over time.
4.3 Discovery Layer
Federated, not decentralized (see §6). Options on the table:
- Federated registries with a common protocol (ActivityPub-style for products)
- Content-addressed listings with competing indexers
- Sellers pay per-query-match to indexers rather than per-impression to users — moving economics away from attention auctions
4.4 Buyer Agent Layer
Runs locally or in a user-controlled enclave.
- Holds preferences, constraints, values, budget, history
- Hard privacy boundary — nothing leaves without explicit, scoped release
- Structured interaction with listings, not scraping ad-filled pages
- Defensive architecture against prompt injection — seller data flows through a sandboxed reader context, never into the planning loop. Dual-LLM / CaMeL-style separation.
- Kickback-proof — audit trail of why X was recommended, verifiable by the user
4.5 Attention-Pricing / Communication Layer
When sellers want to reach a buyer outside of discovery, they pay the buyer's agent, which pays the user. Priced per category and urgency. Messages are structured and auditable. Dark patterns stop working by construction because the agent filters.
5. Prior Art and Useful Primitives
Scaffolding exists; verification and the buyer-agent runtime do not.
schema.org, JSON-LD, Product schema — structural scaffolding
- W3C DIDs + Verifiable Credentials — identity and portable attestation primitives, standardized
- C2PA — closest existing provenance model; adaptable
- Solid / personal data stores — user-side infrastructure primitive
- Content-addressed storage (IPFS-style) — attestation artifact storage
- ZK proof systems — privacy-preserving claim verification
- Merkle commitments — tamper-evident review corpora
What is missing: the glue, the buyer-agent runtime, and the economic flip.
What to ignore from the crypto/web3 orbit
- Tokens, DAOs, "decentralize everything" ideology
- On-chain anything beyond strict settlement
- Any design that requires users to hold wallets
The useful parts are cryptographic primitives. The useless parts are governance theater and token economics.
6. Centralization vs Decentralization
Not one decision — five, across the layers above.
| Layer |
Recommended posture |
Why |
| Schema / protocol |
Centralized governance, open spec |
Forking the canonical schema is pure loss. Standards bodies earn their keep here. |
| Registry / discovery |
Federated |
Many operators, one protocol; switching costs stay near zero; no one becomes Google. |
| Attestations / trust data |
Plural + portable |
Many issuers, cryptographic portability via Verifiable Credentials. |
| Buyer agent runtime |
Local / user-controlled |
Non-negotiable. A hosted agent is the platform's agent wearing your name. |
| User data / preferences |
Local-first |
Non-negotiable. |
Key distinction — federated, not decentralized. Decentralized systems (DHT, blockchain) consistently fail for consumer products: bad performance, bad UX, impossible moderation. Federated systems (email, Matrix, ActivityPub) ship.
Operating principle: the goal is not "no one is in charge"; it is "no one can lock users in." Centralized services with genuinely portable data and open protocols usually beat decentralized systems that have de facto lock-in through bad UX.
Rollout posture: start centralized, architect for federation, keep the user side sovereign from day one. Commit to open schema and portable credentials at v0 even if registry and agent are single-vendor, so users and sellers can always walk away.
7. Hard Problems
- Prompt injection of buyer agents. Any seller-supplied text is adversarial. Requires strict separation between content-reader and planner contexts. Tractable in narrow domains; unsolved in general.
- Cold start. Sellers will not list verifiably until buyers demand it; buyers will not demand it until sellers list. Solution path: start in categories where consumers are already burned and regulation already produces seed data (supplements, financial products, SaaS pricing).
- Infrastructure centralization pressure. Whoever runs the registry risks becoming adtech 2.0. Mitigation: open protocol plus multiple implementations plus a foundation governing the schema.
- Agent alignment / kickbacks. Buyer agents that quietly favor paying sellers become the new SEO. Mitigation: open-source reference agents, reproducible recommendations, user-visible decision traces, third-party audits of commercial implementations.
- Distribution and coordination costs persist. Shelf space, platform fees, legitimate category education for genuinely new things will not vanish — they will be repriced.
- Brands compress risk usefully. "I trust Patagonia" is cognitively cheap. Agents can replace this but only when verification infrastructure is good enough. Until then, brands persist.
8. Prototype Roadmap
Ranked by effort and dependency footprint. ✓ = shipped in this repo.
Tier 1 — weekend-scale, no ecosystem dependencies
- ✓ Dark-pattern and real-price auditor. Shipped as CLI (
auditor fetch, auditor detect) and SvelteKit UI. No browser extension yet; runs locally. Real-price normalization is not implemented — deferred.
- ✓ Claim extractor across a fixed category. Shipped. Run over 56 pages across 4 verticals; findings above. Intended corpus is 200–1000 per vertical; we're at ~0.05× that.
Tier 2 — month-scale prototype stack
- Verifiable registry MVP in one vertical. Not started. Schema is implicit in the extractor's Zod types; needs to be lifted into a standalone JSON-LD profile and a reference registry.
- Agent-to-agent negotiation simulator. Not started.
Tier 3 — quarter-scale, first real users
- Personal shopping agent with verifiable-only mode. Not started.
- Attention-pricing pilot. Not started.
Recommended starting sequence
- ✓ Dark-pattern and real-price auditor.
- ✓ Claim extractor in one vertical.
- Next — choose from §11 depending on audience pull.
9. Policy Adjacency (out of scope, but worth naming)
The highest-leverage non-technical move: mandate machine-readable, verifiable product claims — extending nutrition-label logic to more categories. This alone reshapes the industry without banning anything, and creates the regulatory seed data the registry needs.
10. Open Questions
- Which vertical is the best wedge? Supplements (fraud is rampant, regulation partial), SaaS pricing (opaque, high-value), consumer electronics (spec-rich, measurable)? Early empirical note (§Findings): at N=56 the dirtiest vertical by % unverifiable was consumer electronics (49%), not supplements (45%). SaaS pricing had the widest per-vendor spread (0% – 100%) — the most useful material for a "vendor-by-vendor" audit narrative.
- What is the minimum viable buyer-agent capability that produces visibly better decisions than current search?
- How are buyer agents funded in a way that is not corruptible? Subscription, user-paid per-query, public-infrastructure-funded?
- What is the governance model for the schema — foundation, standards body, benevolent maintainer?
- How do we handle categories where "verifiable" is genuinely hard (taste, fit, aesthetics)?
11. Next Steps
The POC is mature enough to be useful to three audiences without further major infrastructure work. Move depends on which audience the project is aimed at.
Journalists, advocacy groups, consumer-protection researchers. Highest-leverage audience for the current tool. The per-page detail view produces cited, categorized claim breakdowns that are publication-ready. Unlocks:
- Grow corpus to N≥200 across canonical categories; publish the dataset.
- Short essay ("we audited 200 commercial pages; here's what we found") with the tool as backing citation.
- Hosted demo so readers can run their own audits.
Product / marketing teams at companies that want to differentiate on claim integrity. Narrow but real utility — benchmark competitor pricing pages, pressure-test own copy. Unlocks:
- CSV / JSON export of audit results.
- Version-over-version diff: "this week's pricing page added 3 unverifiable claims."
Academic researchers (HCI, behavioral econ, marketing, dark-pattern studies). The pipeline is already a credible research instrument. Unlocks:
- Methodology write-up with prompt design, Zod schema, evaluation protocol.
- Published corpus (anonymised if needed).
- Reproducibility: pinned model versions, seed/run determinism where possible.
Explicitly not on the roadmap yet: everyday-consumer browser extension, enterprise features (auth / SLA / report export), real-time monitoring. These are legitimate future directions but don't benefit from being rushed before the above audiences are served.