Marketing Auditor

meigo
1

POC: marketing claim + dark-pattern audit for commercial web pages. CLI + local web UI that fetches a page, extracts claims, classifies their verifiability, and flags dark patterns.

#bun #claim-verification #cli #consumer-protection #dark-patterns #llm #marketing #openai #playwright #sqlite

Download

marketing-auditor

POC: marketing claim + dark-pattern audit for commercial web pages.

What is built today: a CLI + local web UI that fetches any product / pricing / landing page, extracts its claims, classifies them by verifiability (verifiable / potentially_verifiable / unverifiable) against cited standards, detects rule-based dark patterns, and aggregates findings across vendors and verticals. Runs locally; data stays on the machine.

What is not built yet: buyer-side autonomous agents, attestation infrastructure, published verifiable-claim schemas, browser extension, hosted service. The longer essay below ("Fair Marketing — Problem Statement & Technical Directions") is the thesis and roadmap; the rest of this file documents it.

The repo name is marketing-auditor. The UI and CLI brand is auditor. This is a working title for a POC, not a product name.

Quick start

# Install dependencies
bun install

# Single-page audit (no LLM)
bun run cli fetch https://example.com

# Full audit incl. LLM claim extraction (needs OPENAI_API_KEY)
export OPENAI_API_KEY=sk-...
bun run cli fetch --extract https://example.com

# Batch eval over a URL list
bun run cli eval data/seeds/saas-pricing.txt
bun run cli detect      # rule-based dark-pattern backfill over cached pages
bun run cli report      # aggregate all results

# Browse findings in the web UI
bun run --cwd apps/app dev
# → http://localhost:5173

Known limitations of this POC

No tests. Nothing is unit-tested; run behavior has been validated manually and via the eval harness only.
LLM extraction variance (~10% at the claim level) — same page re-extracted produces slightly different classifications. Not a bug; a consequence of non-deterministic generation.
Playwright render fallback is broken on some Windows setups (CDP handshake times out). The fetcher degrades gracefully to static HTML; pages that need JS rendering (iHerb in particular) produce thin extractions.
CLI exit codes exist (1 / 2 / 3 / 4 for various failure modes) but are not yet documented.
Pagination not implemented in the browse UI. Degrades past ~500 rows.
SSRF: /api/audit blocks loopback/private-IP hosts, but this is a POC — don't expose the dev server to the internet.
Security posture: assumes local-only use. The /api/extractions endpoint exposes all stored data with no auth; fine for localhost, don't deploy without adding auth.
Schema drift between bun:sqlite (CLI writes) and better-sqlite3 (Node reads): both libraries read the same SQLite file but schema changes must be replicated in both layers (packages/core/src/storage.ts and apps/app/src/lib/server/db.ts).

Findings so far

Small empirical corpus. All absolute numbers are from ~N=56 commercial pages + 3 Wikipedia controls, audited with GPT-4.1. Treat as illustrative, not publishable.

Headline: 0.7% of commercial claims are verifiable (4 / ~750) — all four are on a single page (Swanson's Deep Rest supplement, which cites Gersappe V et al. Int J Basic Clin Pharmacol. 2024;13(4):475-485). Every other commercial page in the corpus is 0% verifiable.

56% of commercial claims are potentially verifiable — specific enough to test (numeric quantities, named certifications, concrete thresholds) but the page cites no independent source. The raw material for verification exists; the attestation infrastructure doesn't.
43% are unverifiable — vague superlatives, undefined terms, emotional appeals.

Per-vertical % unverifiable (higher = more marketing-heavy):

control (Wikipedia): 0%
SaaS pricing (15 pages): 39%
supplements (19 pages): 45%
consumer electronics (18 pages): 49% — counterintuitively the dirtiest, driven by lifestyle-audio brand copy

Within-vertical spread is larger than between-vertical spread. SaaS pricing pages ranged from 0% unverifiable (Render) to 100% (GitHub, Atlassian). Consumer electronics from 23% (Ember) to 63% (one Keychron keyboard). Vendor-level scoring is the more useful lens than category-level averages.

Dark patterns: 22 rule-detectable patterns across 18/56 pages (~32%). Dominant types: autoplay media (attention_manipulation) and strikethrough pricing (price_obfuscation). False-scarcity ("only N left") surfaced on zero pages in this corpus — modern sites either don't use it or JS-render it out of reach of static fetch.

Economics: ~$0.015 per page with GPT-4.1. 1000 pages ≈ $15.

Repository layout

marketing-auditor/
  packages/core/        TypeScript library: models, fetcher, sanitizer,
                        extractor (OpenAI), dark-patterns, storage (bun:sqlite)
  apps/cli/             Bun CLI: fetch, eval, detect, report, discover
  apps/app/             SvelteKit 2 + Svelte 5 + Tailwind v4 web UI
  data/seeds/           Starter URL lists per vertical
  archive/              Earlier Python prototype (gitignored)

Development

bun run --cwd apps/app check       # svelte-check + biome + prettier + vite build
bun run --cwd apps/app check:fix   # auto-fix Biome and Prettier

Fair Marketing — Problem Statement & Technical Directions

1. Problem Statement

Modern marketing has drifted into a state where it is:

Intrusive — interrupts attention the consumer did not offer
Misleading — optimizes for persuasion rather than informed choice
Expensive — a significant share of consumer prices pays for the marketing apparatus itself (adtech, agencies, sponsorships, SEO arms races, attention auctions)
Asymmetric — sellers know more than buyers and spend money to keep it that way

For many categories (cosmetics, supplements, SaaS, DTC brands), marketing plus distribution margin exceeds production cost. Consumers pay for a war of attention they did not enlist in.

Goal: maintain the legitimate functions of marketing (discovery, claim communication, trust-building) while making the experience feel invisible, making claims verifiable, and drastically reducing cost — with the savings flowing to the consumer.

2. Core Thesis

The persuasion-heavy layer of marketing exists because of an information and leverage asymmetry between sellers and buyers. Close that asymmetry and most of the cost structure, intrusiveness, and manipulation collapse as economic side-effects — not because of ethics, but because the tactics stop working.

The closing mechanism: capable buyer-side agents + machine-verifiable seller claims + portable reputation + attention priced to the receiver.

3. What AI and Adjacent Tech Change

Four shifts look real, not hype:

Buyer-side agents flip the asymmetry. An agent that negotiates, filters, and compares on the user's behalf makes ads aimed at the user wasted spend. Sellers must instead market to agents — meaning structured, verifiable claims rather than FOMO, vibes, and dark patterns.
Discovery becomes pull, not push. Sellers publish machine-readable specs and reputational history; buyer agents query them. Spend shifts from interrupting to being findable and verifiable — much cheaper.
Reputation becomes portable and cryptographically checkable. Warranty claims, defect rates, return rates, real usage data — published or provable via ZK — make brand trust earned and cheap to verify rather than bought via advertising.
Personalization without surveillance. On-device models personalize without data ever leaving the user. The ~$600B global adtech tracking industry becomes largely obsolete.

4. Technical Architecture

Five layers. The split between centralized, federated, and sovereign is deliberate — see §6.

4.1 Claim Layer — the keystone

A machine-readable schema where every product claim is typed and either verifiable or explicitly marked as unverified marketing. Extends schema.org with:

Numeric claims with units, measurement methodology, and a signed attestation hash e.g. battery_life: {value: 10, unit: hours, test: "IEC 61960 @ 25°C", source: <attestation>}
Categorical claims with controlled vocabularies — no free-form "eco-friendly"; either a given certification applies or it does not
Probabilistic claims with confidence intervals and sample size
Every claim signed by the seller; verified claims co-signed by an issuer (lab, auditor, regulator, or buyer-agent consensus)

Without this layer, everything else is vibes.

4.2 Attestation / Trust Layer

Multiple mechanisms, layered by cost and trust:

Self-attested + bonded — seller posts collateral, auto-slashed on falsification. Cheap; works for low-stakes claims.
Third-party attested — labs, certifiers, auditors sign claims. Expensive; high trust.
Crowd-attested — buyer agents report outcomes post-purchase, forming distributed ground truth. Medium trust; scales well.
ZK-attested — prove "returns rate < 3%" without exposing underlying data. Useful when data is competitive.

A reputation graph covers issuers too, so captured certifiers get discounted over time.

4.3 Discovery Layer

Federated, not decentralized (see §6). Options on the table:

Federated registries with a common protocol (ActivityPub-style for products)
Content-addressed listings with competing indexers
Sellers pay per-query-match to indexers rather than per-impression to users — moving economics away from attention auctions

4.4 Buyer Agent Layer

Runs locally or in a user-controlled enclave.

Holds preferences, constraints, values, budget, history
Hard privacy boundary — nothing leaves without explicit, scoped release
Structured interaction with listings, not scraping ad-filled pages
Defensive architecture against prompt injection — seller data flows through a sandboxed reader context, never into the planning loop. Dual-LLM / CaMeL-style separation.
Kickback-proof — audit trail of why X was recommended, verifiable by the user

4.5 Attention-Pricing / Communication Layer

When sellers want to reach a buyer outside of discovery, they pay the buyer's agent, which pays the user. Priced per category and urgency. Messages are structured and auditable. Dark patterns stop working by construction because the agent filters.

5. Prior Art and Useful Primitives

Scaffolding exists; verification and the buyer-agent runtime do not.

schema.org, JSON-LD, Product schema — structural scaffolding
W3C DIDs + Verifiable Credentials — identity and portable attestation primitives, standardized
C2PA — closest existing provenance model; adaptable
Solid / personal data stores — user-side infrastructure primitive
Content-addressed storage (IPFS-style) — attestation artifact storage
ZK proof systems — privacy-preserving claim verification
Merkle commitments — tamper-evident review corpora

What is missing: the glue, the buyer-agent runtime, and the economic flip.

What to ignore from the crypto/web3 orbit

Tokens, DAOs, "decentralize everything" ideology
On-chain anything beyond strict settlement
Any design that requires users to hold wallets

The useful parts are cryptographic primitives. The useless parts are governance theater and token economics.

6. Centralization vs Decentralization

Not one decision — five, across the layers above.

Layer	Recommended posture	Why
Schema / protocol	Centralized governance, open spec	Forking the canonical schema is pure loss. Standards bodies earn their keep here.
Registry / discovery	Federated	Many operators, one protocol; switching costs stay near zero; no one becomes Google.
Attestations / trust data	Plural + portable	Many issuers, cryptographic portability via Verifiable Credentials.
Buyer agent runtime	Local / user-controlled	Non-negotiable. A hosted agent is the platform's agent wearing your name.
User data / preferences	Local-first	Non-negotiable.

Key distinction — federated, not decentralized. Decentralized systems (DHT, blockchain) consistently fail for consumer products: bad performance, bad UX, impossible moderation. Federated systems (email, Matrix, ActivityPub) ship.

Operating principle: the goal is not "no one is in charge"; it is "no one can lock users in." Centralized services with genuinely portable data and open protocols usually beat decentralized systems that have de facto lock-in through bad UX.

Rollout posture: start centralized, architect for federation, keep the user side sovereign from day one. Commit to open schema and portable credentials at v0 even if registry and agent are single-vendor, so users and sellers can always walk away.

7. Hard Problems

Prompt injection of buyer agents. Any seller-supplied text is adversarial. Requires strict separation between content-reader and planner contexts. Tractable in narrow domains; unsolved in general.
Cold start. Sellers will not list verifiably until buyers demand it; buyers will not demand it until sellers list. Solution path: start in categories where consumers are already burned and regulation already produces seed data (supplements, financial products, SaaS pricing).
Infrastructure centralization pressure. Whoever runs the registry risks becoming adtech 2.0. Mitigation: open protocol plus multiple implementations plus a foundation governing the schema.
Agent alignment / kickbacks. Buyer agents that quietly favor paying sellers become the new SEO. Mitigation: open-source reference agents, reproducible recommendations, user-visible decision traces, third-party audits of commercial implementations.
Distribution and coordination costs persist. Shelf space, platform fees, legitimate category education for genuinely new things will not vanish — they will be repriced.
Brands compress risk usefully. "I trust Patagonia" is cognitively cheap. Agents can replace this but only when verification infrastructure is good enough. Until then, brands persist.

8. Prototype Roadmap

Ranked by effort and dependency footprint. ✓ = shipped in this repo.

Tier 1 — weekend-scale, no ecosystem dependencies

✓ Dark-pattern and real-price auditor. Shipped as CLI (auditor fetch, auditor detect) and SvelteKit UI. No browser extension yet; runs locally. Real-price normalization is not implemented — deferred.
✓ Claim extractor across a fixed category. Shipped. Run over 56 pages across 4 verticals; findings above. Intended corpus is 200–1000 per vertical; we're at ~0.05× that.

Tier 2 — month-scale prototype stack

Verifiable registry MVP in one vertical. Not started. Schema is implicit in the extractor's Zod types; needs to be lifted into a standalone JSON-LD profile and a reference registry.
Agent-to-agent negotiation simulator. Not started.

Tier 3 — quarter-scale, first real users

Personal shopping agent with verifiable-only mode. Not started.
Attention-pricing pilot. Not started.

Recommended starting sequence

✓ Dark-pattern and real-price auditor.
✓ Claim extractor in one vertical.
Next — choose from §11 depending on audience pull.

9. Policy Adjacency (out of scope, but worth naming)

The highest-leverage non-technical move: mandate machine-readable, verifiable product claims — extending nutrition-label logic to more categories. This alone reshapes the industry without banning anything, and creates the regulatory seed data the registry needs.

10. Open Questions

Which vertical is the best wedge? Supplements (fraud is rampant, regulation partial), SaaS pricing (opaque, high-value), consumer electronics (spec-rich, measurable)? Early empirical note (§Findings): at N=56 the dirtiest vertical by % unverifiable was consumer electronics (49%), not supplements (45%). SaaS pricing had the widest per-vendor spread (0% – 100%) — the most useful material for a "vendor-by-vendor" audit narrative.
What is the minimum viable buyer-agent capability that produces visibly better decisions than current search?
How are buyer agents funded in a way that is not corruptible? Subscription, user-paid per-query, public-infrastructure-funded?
What is the governance model for the schema — foundation, standards body, benevolent maintainer?
How do we handle categories where "verifiable" is genuinely hard (taste, fit, aesthetics)?

11. Next Steps

The POC is mature enough to be useful to three audiences without further major infrastructure work. Move depends on which audience the project is aimed at.

Journalists, advocacy groups, consumer-protection researchers. Highest-leverage audience for the current tool. The per-page detail view produces cited, categorized claim breakdowns that are publication-ready. Unlocks:

Grow corpus to N≥200 across canonical categories; publish the dataset.
Short essay ("we audited 200 commercial pages; here's what we found") with the tool as backing citation.
Hosted demo so readers can run their own audits.

Product / marketing teams at companies that want to differentiate on claim integrity. Narrow but real utility — benchmark competitor pricing pages, pressure-test own copy. Unlocks:

CSV / JSON export of audit results.
Version-over-version diff: "this week's pricing page added 3 unverifiable claims."

Academic researchers (HCI, behavioral econ, marketing, dark-pattern studies). The pipeline is already a credible research instrument. Unlocks:

Methodology write-up with prompt design, Zod schema, evaluation protocol.
Published corpus (anonymised if needed).
Reproducibility: pinned model versions, seed/run determinism where possible.

Explicitly not on the roadmap yet: everyday-consumer browser extension, enterprise features (auth / SLA / report export), real-time monitoring. These are legitimate future directions but don't benefit from being rushed before the above audiences are served.

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing