Research Agent

ndcorder

Autonomous research paper writing toolkit for Claude Code — produces publication-ready LaTeX papers through multi-agent orchestration

#ai #claude #latex #multi-agent #research #svelte

Download

Research Agent

An autonomous research paper writing toolkit for Claude Code. From a topic prompt, it produces a publication-ready, journal-quality LaTeX paper through multi-agent orchestration — running for 1-4 hours (standard) or 3-8 hours (deep mode).

New here? See QUICKSTART.md for the 3-minute setup.

Visual overview? Open docs/pipeline-diagram.html in a browser.

Security model? See SECURITY.md. Additional docs: Architecture, Developer Guide, Venue Reference, Pipeline Reference, Scripts Reference.

What It Is
Getting Started
The Full Pipeline: /write-paper
The /auto Command
Provenance System
Knowledge Graph
All 39 Slash Commands
Integrations
Writing Rules
Model Tiers
Project Structure
Configuration Reference

1. What It Is

Research Agent is a Claude Code workspace template and orchestration system that writes complete, journal-quality research papers autonomously. You provide a topic; the pipeline does the rest.

The core workflow is:

create-paper my-paper "LLM reasoning" --venue arxiv
cd my-paper
write-paper

That command sequence launches a multi-stage pipeline that:

Runs 4-12 parallel research agents across academic databases
Audits source access levels and pauses to let you provide paywalled PDFs
Builds a knowledge graph from all source extracts (optional, requires OpenRouter)
Defines a thesis, contribution statement, and claims-evidence matrix
Verifies novelty before committing to writing
Writes each section sequentially with dedicated Opus-tier agents
Generates figures and tables, including venue-matched publication figures via Praxis
Runs parallel QA review (3 agents plus optional adversarial Codex review) in a loop
Runs post-QA consistency checks, claims audits, and reference validation
Produces a finalized, de-AI-polished, compiled PDF with full provenance

Everything produced is traced. Every paragraph in the final paper links back to its origin: which sources informed it, which agent wrote it, what feedback revised it, and what was cut and why.

Value proposition

Mode	Research agents	Reference target	Estimated time	Estimated cost
Standard	5	30-50 refs	1-4 hours	~$50
Deep	12 + 3 targeted-pass	60-80 refs	3-8 hours	~$150

2. Getting Started

Prerequisites

Required:

One runtime CLI: Claude Code (claude) or Codex (codex)
A LaTeX distribution with pdflatex and latexmk
Python 3.10+
Git

Optional:

codex-bridge (npm i -g codex-bridge) — adversarial AI review from OpenAI Codex throughout the pipeline
OPENROUTER_API_KEY — required to build and query the knowledge graph (LightRAG via Gemini Flash and Qwen3 8B)
CORE_API_KEY — free API key from CORE (200M+ institutional repository papers). Register at core.ac.uk for a free key.
NCBI_API_KEY — free API key from NCBI. Increases PubMed Central rate limit from 3/s to 10/s. Only useful for biomedical/clinical papers.
Praxis scientific analysis toolkit — auto-cloned as a submodule by create-paper

Installation

Symlink the two launcher scripts to somewhere on your PATH:

git clone https://github.com/ndcorder/research-agent
cd research-agent
ln -s $(pwd)/create-paper ~/.local/bin/create-paper
ln -s $(pwd)/write-paper ~/.local/bin/write-paper
ln -s $(pwd)/sync-papers ~/.local/bin/sync-papers

How updates work

Paper projects symlink their runtime scaffold and scripts/ back to this repository's template. Claude projects use .claude/; Codex projects use .codex/ plus a root AGENTS.md symlink. When you update the template, every paper project sees the change instantly.

If you have existing projects created before the symlink migration, run sync-papers once to convert them:

sync-papers /path/to/your/papers

Safe to run multiple times. Projects already using symlinks are skipped.

Creating a new paper project

create-paper <directory> [topic] [--venue <venue>] [--runtime <claude|codex>] [--deep]

Examples:

# Create a project and immediately offer to launch the pipeline
create-paper my-survey "A survey on LLM reasoning" --venue arxiv

# Create the project structure without starting (add topic later)
create-paper my-paper --venue neurips

# Codex-native project
create-paper my-paper "Protein structure prediction" --venue nature --runtime codex

# Deep mode: 12 agents, 60-80 refs, targeted second pass
create-paper my-paper "Protein structure prediction" --venue nature --deep

create-paper does the following in one command:

Creates the project directory
Generates a main.tex formatted for the target venue
Scaffolds the selected runtime (.claude/ or .codex/)
Writes .paper.json and .venue.json
Clones 177 scientific skills as a git submodule (vendor/claude-scientific-skills)
Clones Praxis as a git submodule (vendor/praxis) and installs Python dependencies
Initializes codex-bridge if it is installed
Creates the initial git commit
Verifies LaTeX compilation

Launching the pipeline

From inside the paper project:

# Method 1: the write-paper launcher script (recommended)
write-paper                      # uses topic from .paper.json
write-paper "new topic"          # overrides topic

# Method 2: runtime-neutral command launcher
scripts/run-paper-command preview-pipeline
scripts/run-paper-command health

write-paper reads .paper.json and dispatches to the configured runtime. Claude projects use the Claude slash-command flow. Codex projects use codex exec with the project-local .codex/ instructions. See docs/CODEX.md for the Codex-specific workflow.

Monitoring a running pipeline

The pipeline writes a human-readable progress file you can watch from another terminal:

cat .paper-progress.txt          # current stage, section word counts
ls research/                     # research files as they appear
ls reviews/                      # review files during QA

Available venues

Flag	Format	Citation style	Page limit
`generic`	Standard article (default)	natbib (`\citep`, `\citet`)	none
`ieee`	IEEEtran two-column	numeric	8 pages
`acm`	acmart sigconf, double-blind	natbib	12 pages
`neurips`	NeurIPS single-column	natbib	9 pages
`nature`	Nature family — Results before Methods	numeric superscript	none
`arxiv`	arXiv preprint, extended format	natbib	none
`apa`	APA 7th edition	apacite	none

Each venue JSON also includes a writing_guide field with venue-specific tone, structure, citation density, figure count, and reviewer expectation guidance. Writing agents read the guide to match the conventions of the target venue.

3. The Full Pipeline: `/write-paper`

Run with the active runtime, or via the write-paper launcher. The pipeline reads .paper.json for topic, venue, depth, and runtime, then executes the shared stages sequentially.

Checkpoint and resume

After completing each stage or section, the pipeline writes .paper-state.json tracking exactly which stages and sections are done. If the session is interrupted, rerunning /write-paper reads this file and skips completed work. Partial stage recovery tracks sub-steps within stages (individual research agents, individual sections) so a crash mid-stage resumes from the last completed sub-step, not the start of the stage. .paper-progress.txt is updated at each checkpoint with a human-readable summary.

Stage 1: Deep Literature Research

Goal: 30-50 verified references (standard) or 60-80 (deep mode) covering the field comprehensively.

Before spawning agents, the pipeline detects the paper's domain from the topic keywords (Biomedical, Chemistry, CS/AI, Physics, Materials Science, Ecology, Economics, Clinical, or General). This determines which scientific skill databases are prioritized.

Standard mode: 5 agents in parallel

Agent	Task	Output file
Field Survey	10-15 most influential papers, major research threads, recent breakthroughs	`research/survey.md`
Methodology Deep Dive	Major methodological approaches, state-of-the-art, standard benchmarks	`research/methods.md`
Empirical Evidence	Standard benchmarks, datasets, SotA results, reproducibility concerns	`research/empirical.md`
Theoretical Foundations	Formal definitions, theorems, connections to broader frameworks	`research/theory.md`
Gap Analysis	Research gaps, promising directions, proposed thesis and contribution	`research/gaps.md`

The Gap Analysis agent (Agent 5) runs AFTER the first four complete, because it reads all their output.

Deep mode: 7 additional agents in parallel (with agents 1-4)

Agent	Task	Output file
Recent Frontiers	Papers published 2024-2026 only, emerging trends	`research/recent_frontiers.md`
Negative Results	What didn't work, failed approaches, replication failures	`research/negative_results.md`
Cross-Disciplinary	Insights from adjacent fields	`research/cross_disciplinary.md`
Datasets & Reproducibility	All standard datasets, open-source implementations	`research/datasets_reproducibility.md`
Industry & Applied	Deployed systems, patents, gap between academic and production	`research/industry_applied.md`
Competing Hypotheses	Active scientific debates, schools of thought	`research/competing_hypotheses.md`
Intellectual Lineage	Seminal papers, how ideas evolved, paradigm shifts	`research/intellectual_lineage.md`

Tool fallback chain: Every research agent is instructed to try tools in this order: domain-specific database skills (PubMed, arXiv, etc.) → Perplexity search → WebSearch → Firecrawl search → WebFetch on known URLs → research-lookup skill. Agents must try at least 3 different tools before giving up on a query.

Codex independent contribution: After all Claude research agents complete, Codex is called to independently suggest papers Claude may have missed, drawing on its own training data. Its suggestions are verified for existence before being passed to the bibliography builder.

Bibliography builder (haiku model): Reads all research files, extracts every cited paper, verifies each against CrossRef or search, generates BibTeX entries, and writes references.bib. If fewer than 25 references result (standard) or 50 (deep), additional targeted research agents are spawned for underrepresented areas.

Research log: After every tool call (success or failure), agents append an entry to research/log.md recording timestamp, agent name, tool used, exact query, result, and any URLs or DOIs found. This log is the complete provenance trail of all literature searches.

Source extracts: For every cited paper, the agent creates research/sources/<bibtexkey>.md recording the access level (FULL-TEXT, ABSTRACT-ONLY, or METADATA-ONLY), the actual content that was read (verbatim or near-verbatim — not a summary), key findings, and provenance.

Stage 1c: Codex Research Cross-Check

A standalone stage that reads every research file and calls Codex to cross-check each one's major claims: are they accurately represented? Are there contradicting studies or missing nuances? Are key papers overlooked?

If Codex identifies missing perspectives or inaccurate claims, a targeted follow-up research agent investigates those specific gaps. All Codex feedback and Claude's evaluations are logged to research/codex_cross_check.md.

This stage is non-skippable. The pipeline verifies the file exists before proceeding.

Stage 1d: Source Coverage Audit and Acquisition

Goal: Ensure every cited source has a verifiable content snapshot before writing begins.

Phase 1 — Audit: Every BibTeX key is matched against research/sources/. Each paper is classified as FULL-TEXT, ABSTRACT-ONLY, or METADATA-ONLY based on its source extract file.

Phase 2 — Automated OA resolution: For each ABSTRACT-ONLY and METADATA-ONLY source, the pipeline attempts to find a free full-text copy:

Semantic Scholar API — checks the openAccessPdf field
Web search for a direct PDF URL (arXiv, bioRxiv, SSRN, ResearchGate, etc.)
Academic repository search — notes URLs even if a login is required

Each successfully resolved paper has its source extract upgraded to FULL-TEXT with a new content snapshot.

Phase 3 — Human acquisition (human-in-the-loop): After automated resolution, if papers still lack full text, the pipeline pauses and presents a prioritized acquisition list sorted by how many times each paper is cited in the research files. In standard mode it presents the top 5; in deep mode it presents all abstract-only and metadata-only papers. The user can drop PDFs into attachments/ and reply "continue", or reply "skip" to proceed with abstract-level sources (which get flagged in the claims-evidence matrix).

Phase 4 — Coverage report: research/source_coverage.md is written with final counts by access level.

Knowledge Graph Build

Immediately after Stage 1d, the pipeline attempts to build a LightRAG knowledge graph:

python scripts/knowledge.py build

This reads all research/sources/*.md files, extracts entities (papers, theories, methods, findings, authors) and relationships (cites, contradicts, supports, extends), and builds a queryable graph with semantic embeddings. The graph is stored in research/knowledge/ (gitignored).

If scripts/knowledge.py is not found or OPENROUTER_API_KEY is not set, this step is silently skipped. The pipeline works without the knowledge graph; agents fall back to reading research files directly.

Stage 2: Thesis, Contribution, and Outline

With all research complete, the pipeline:

Defines the thesis and contribution in research/thesis.md — the specific problem, the novel contribution, and the key claims.
Determines paper structure from the topic type and venue. Nature-family papers get Results before Methods. IEEE and ACM papers get tighter sections to respect page limits. Survey papers use thematic analysis sections rather than IMRAD.
Creates the detailed outline in main.tex — all sections and subsections with % OUTLINE: comments containing the key arguments, planned citation keys, figure plans, and word targets per section.
Builds the Claims-Evidence Matrix in research/claims_matrix.md. Every major claim the paper will make is listed with: the type of evidence supporting it (experiment, citation, formal proof, or data), which section will present it, and whether the supporting sources are FULL-TEXT or ABSTRACT-ONLY. Claims backed only by abstract-level sources are flagged with a warning.
Codex stress-tests the claims-evidence matrix — checking whether evidence is actually sufficient for each claim, whether any claims are too strong for the evidence type, and whether obvious claims are missing.
Logs planning provenance — entries in research/provenance.jsonl for the thesis selection, section structure decisions, and each claim's evidence strategy.

Stage 2b: Codex Thesis Stress-Test

A standalone stage that calls codex_plan to challenge the paper plan before any writing begins:

Is the contribution genuinely novel given the related work?
Is the argument structure sound?
What will reviewers attack?
Are there missing sections or weak links?

Results go to research/codex_thesis_review.md. If Codex identifies structural problems, they are fixed in the outline before proceeding.

Stage 2c: Targeted Second Research Pass (deep mode only)

Only runs when depth is "deep". Now that the thesis and claims are finalized, three agents run a surgical second research pass:

Agent	Task	Output file
Supporting Evidence	3-5 papers providing direct evidence for each key claim	`research/targeted_support.md`
Counterarguments	Papers that contradict or challenge the claims, strongest reviewer objections	`research/targeted_counter.md`
Methodological Precedents	Papers using similar methods, known limitations, best practices	`research/targeted_methods.md`

The bibliography builder runs again after to incorporate new references.

Stage 2d: Novelty Verification

A standalone stage that searches for existing work making the same contribution before writing begins. This catches the worst case: investing hours in writing a paper that already exists.

The pipeline searches arXiv, Semantic Scholar, domain-specific databases, and (if available) calls Codex to check its own training data. Results are classified:

NOVEL — proceed
PARTIALLY NOVEL — document the distinction, update thesis to clarify, add similar papers to Related Work
NOT NOVEL — halt the pipeline and report to the user with the conflicting papers

Results are written to research/novelty_check.md.

Stage 3: Section-by-Section Writing

Each section gets its own dedicated Opus-tier agent running sequentially. Writing is sequential (each section can reference prior ones); research and review are parallel.

Deep mode per-section literature searches: Before each writing agent, a haiku-tier agent quickly searches for literature specific to that section's topics and writes a targeted reference list to research/section_lit_[section].md. The writing agent then reads this alongside the main research files.

Writing order and targets:

Order	Section	Key guidance
1	Introduction	Broad context → specific problem → contribution → findings preview → paper organization. 8-12 citations.
2	Related Work	Organize thematically in 3-5 subsections. 3-5 papers per theme. Explicit positioning of this work. 15-20 citations.
3	Methods/Approach	Exhaustive: full detail for reproduction, math in equation/align environments, pseudocode if applicable, design rationale.
4	Results/Experiments	Setup → quantitative results (tables) → ablations → qualitative analysis. At least 2 booktabs tables.
5	Discussion	Interpret findings, compare with prior work, limitations (written honestly), broader implications, future work.
6	Conclusion	Concise. Restate problem, summarize approach, highlight key results with numbers. No new information.
7	Abstract	Written last, reads the full paper first. Specific, quantitative, self-contained.

Codex-authored Limitations: After the Discussion section is written, Codex drafts the Limitations subsection from an adversarial perspective (methodological assumptions, data limitations, scope limitations, threats to validity). Claude evaluates each point, pushes back where warranted, and the Discussion agent integrates the agreed-upon content.

After each section: Claude assesses whether the section is substantively complete. If it is thin, missing citations, or leaving obvious gaps, an expansion agent (Opus tier) adds depth. Then Codex does a quick spot-check of the section for logic, evidence proportionality, and gaps. In deep mode, Codex also identifies substantive content gaps for a further expansion round.

Knowledge graph queries: If the knowledge graph exists, each writing agent queries it for section-specific evidence, checks for contradictions, and ensures comprehensive coverage using the CLI commands.

Provenance logging: After writing each paragraph, the agent appends a provenance entry to research/provenance.jsonl recording the paragraph target (section/pN), which source extracts informed it, which claims it supports, and a reasoning field explaining the writing choices.

Stage 4: Figures, Tables, and Visual Elements

Step 4a — Data-driven figures with Praxis (if available):

If vendor/praxis/scripts/ exists and attachments/ contains data files, a data analysis agent auto-detects the characterisation technique and runs Praxis analysis:

import sys
sys.path.insert(0, "vendor/praxis/scripts")
# venue-matched style, colourblind-safe palette, technique-specific metrics
apply_style("<venue_style>")
set_palette("okabe_ito")

Figures are exported as PDF to figures/, scripts are saved to figures/scripts/ for reproducibility, and quantitative results are inserted into the Results and Methods sections of main.tex.

If Praxis is not installed but data files exist, the pipeline falls back to generic matplotlib using the matplotlib skill.

Step 4b — Structural figures and tables:

A visualization agent ensures the paper has:

At least one overview or architecture figure (TikZ or described in a figure environment)
At least two booktabs-formatted tables
All figures and tables referenced in text before they appear
Descriptive captions (2-3 sentences explaining what is shown and why it matters)

Step 4c — Codex figure and claims audit:

Codex audits whether captions accurately describe what is shown, whether surrounding text claims match what the data actually shows, and whether there are misleading axis scales, cherry-picked comparisons, or missing error bars.

Stage 5: Quality Assurance Loop

This stage loops until all quality criteria pass, up to 5 iterations (standard) or 8 (deep mode). Stale review files are deleted before each iteration so reviewers evaluate the latest version.

Step 5a — Parallel review (3 agents):

Reviewer	Focus
Technical Reviewer	Claims supported by evidence, methodology sound and reproducible, results properly analyzed, argument coherent Introduction to Conclusion
Writing Quality Reviewer	No bullet points in body text, every paragraph complete, transitions smooth, terminology consistent, tense correct, section word counts
Citation and Completeness Reviewer	All citation keys exist in .bib, uncited claims flagged, no placeholder text, all cross-references resolve, LaTeX compilation

Step 5a-ii — Codex adversarial review (parallel with agents):

While the 3 review agents run, Codex is called directly as a 4th adversarial reviewer:

You are an adversarial peer reviewer. Find the weakest points: claims that exceed the evidence, logical gaps in the argument chain, methodological shortcuts, missing baselines or unfair comparisons, conclusions that don't follow from results.

Result goes to reviews/codex_adversarial.md. The pipeline verifies all 4 review files exist before proceeding.

Step 5b — Synthesize: All review files are read and a prioritized fix list is built (CRITICAL first, then MAJOR, word count shortfalls, missing citations).

Step 5c — Revision agent (Opus tier): Fixes every critical and major issue, expands thin sections with substantive content, adds verified citations, and compiles. Content that should be removed is cut, with the removed text archived in provenance/cuts/.

Step 5d — Quality gate: All criteria in the table below must pass. If any fail, loop back to Step 5a. In deep mode, after the final passing iteration, Codex does one additional deep review looking for subtle issues that earlier reviews missed.

Quality gate criteria:

Criterion	Requirement
Sections substantively complete	No obvious gaps, thin arguments, or missing depth
Claims-Evidence Matrix	Every claim has status "Supported"
References in .bib	25+ verified entries
Placeholder text	Zero (no TODO/TBD/FIXME/lipsum)
LaTeX compilation	No errors
Tables	2+ with booktabs
Cross-references	All `\ref{}` resolve
Citation keys	All `\citep{}`/`\citet{}` exist in .bib
Body text	Full paragraphs only, no bullet points

Post-QA: Consistency and Claims Audit

After the QA loop exits (all criteria met), two final audits run in parallel:

Consistency Checker: Finds and fixes notation inconsistencies, terminology drift, abbreviations defined twice or used before definition, tense inconsistencies, and reference format inconsistencies (Figure vs Fig., Section vs Sec.).

Claims Auditor: Flags overclaims — "novel"/"first" without evidence, "significantly" without a statistical test, "prove"/"demonstrate" with only experiments, unsupported factual claims, generalizations from limited experiments. Critical and major overclaims are softened in main.tex.

After both complete, a Reproducibility Checker (haiku tier) verifies that Methods includes all hyperparameters, training details, compute resources, dataset descriptions, evaluation metric definitions, and random seed/variance information.

Post-QA: Reference Validation

A reference validation agent (haiku tier) verifies every BibTeX entry is a real publication:

If DOI present: verify via CrossRef API
If no DOI: search for exact title via Perplexity or web search
Classify each entry: VERIFIED, METADATA MISMATCH, SUSPICIOUS, or FABRICATED
Fix metadata mismatches directly in references.bib
Remove FABRICATED entries from references.bib and their citations from main.tex

In parallel, Codex independently verifies a random sample of 10-15 references. If Codex flags a reference that Claude's validator marked as verified, both findings are investigated further.

This stage is non-negotiable. Fabricated references are the primary risk in AI-assisted writing.

Post-QA: Codex Risk Radar

Codex's codex_risk_radar tool assesses the complete manuscript across five dimensions:

Dimension	Description
Scientific risk	Claims that could be proven wrong or are unfalsifiable
Ethical risk	Data handling, consent, bias, or dual-use concerns
Reputational risk	Anything that could embarrass the authors under scrutiny
Reproducibility risk	Whether an independent team could reproduce the results
Novelty risk	Whether the contribution is incremental enough for desk rejection

Each dimension is rated LOW / MEDIUM / HIGH. HIGH-risk items must be addressed before finalization. MEDIUM-risk items are flagged for the user.

Stage 6: Finalization

Final polish (Opus tier): Reviews the complete manuscript end-to-end: abstract accuracy, title specificity, introduction's paper-organization paragraph matching actual sections, conclusion referencing actual results with numbers, redundancies removed, formatting consistent.

Lay summary (haiku tier): Generates a 200+ word plain-language summary (high school reading level) and a 2-3 sentence elevator pitch, written to research/summaries.md. If the venue requires a lay summary (Nature, medical journals), it is added to main.tex.

De-AI polish (Opus tier): Scans for and removes AI writing patterns:

Filler phrases: "it is worth noting that", "it should be noted", "in the realm of"
AI vocabulary: "delve", "tapestry", "multifaceted", "leverage", "utilize", "elucidate", "paradigm"
Formulaic transitions: "Furthermore,", "Moreover,", "Additionally," at sentence starts
Redundant phrasing: "in order to", "a total of", "the fact that", "due to the fact that"
Empty emphasis: "significantly", "notably", "remarkably", "dramatically" without quantification
Em dashes and en dashes used as punctuation (the single most recognizable AI writing tell)
Uniform paragraph length (all paragraphs the same length signals AI generation)

Provenance report (haiku tier): Reads the complete research/provenance.jsonl and generates research/provenance_report.md — a human-readable report of every action taken, organized by section and stage, with paragraph-level histories.

Archive: The /archive command runs automatically to bundle all artifacts into a browsable archive/ directory with an indexed README.md.

Codex collaboration stats: codex_stats is called to report how the two AI systems collaborated throughout the pipeline.

4. The `/auto` Command

After the pipeline completes, /auto runs additional autonomous improvement iterations on the finished paper. It does not re-run the pipeline — it improves what exists.

/auto            # run 1 iteration
/auto 3          # run 3 iterations
/auto --continue # resume from last completed iteration

Each iteration runs a 4-phase cycle. Like /write-paper, /auto uses the split-phase pattern: each phase reads its instructions from a dedicated file in pipeline/ (auto-phase-1-assessment.md through auto-phase-4-verification.md) to avoid context degradation in long sessions.

Phase 1: Assessment (4 parallel agents)

Agent	Focus
Depth and Evidence	Which arguments are thin, which claims need stronger evidence, logical leaps, redundant paragraphs that should be cut
Structure and Flow	Argument flow between sections, disproportionate sections, reordering opportunities, front-loading of key insights
Competitive Positioning	Differentiation from closest prior work, fairness of comparisons, missing recent papers (2024-2026), field familiarity
Writing and Polish	Em dashes, AI writing patterns, sentence length variety, vague statements, over- or under-claiming

Codex also runs in parallel (if available) to identify the 5 highest-impact improvements with fresh eyes.

Phase 2: Prioritize

All assessment files are read, similar findings are deduplicated, and the top 5 actions for this iteration are selected by impact. Actions are a deliberate mix of strengthening, cutting, structural improvements, and polish. If 3 or fewer meaningful actions are found across all reviewers, early stop is triggered immediately.

The action plan is written to reviews/auto_iter[N]_plan.md with each action typed, targeted to a specific location, and annotated with whether research is needed.

Phase 3: Execute

Research phase (max 3 queries): If any selected action requires new evidence, a targeted research agent runs up to 3 searches.

Revision phase (Opus tier): Executes every action in the plan. For cuts, the removed text is saved to provenance/cuts/[section]-[pN]-auto[N].tex before deletion. After all changes, LaTeX is compiled and errors are fixed.

Phase 4: Verify

A lightweight verification agent confirms:

LaTeX compiles without errors
All citation keys exist in .bib
All \ref{} cross-references resolve
No placeholder text remains
No bullet points in body text
No em dashes used as punctuation
Changes did not break surrounding context
No new AI writing patterns were introduced

Early stop on diminishing returns

If changes_made < 3 after an iteration, the loop stops. The paper has stabilized; additional iterations would introduce noise rather than improvement. This is logged explicitly as a success condition.

Rules

Never touch the thesis (contribution statement in research/thesis.md is fixed)
Cuts are first-class improvements — every assessment agent must consider what to remove
Maximum 3 research queries per iteration
Every change is logged to research/provenance.jsonl with iteration: N
Every cut is archived in provenance/cuts/

5. Provenance System

Every word in the final paper is traceable. The provenance system consists of two components: the machine-readable ledger and the human-readable report.

The provenance ledger

research/provenance.jsonl is an append-only log of every action taken during both the initial pipeline and all /auto iterations. One JSON object per line.

Entry schema:

{
  "ts": "2026-03-24T14:32:11Z",
  "stage": "3",
  "agent": "section-writing-methods",
  "action": "write",
  "target": "methods/p3",
  "reasoning": "Establishes the core architectural choice — why attention over convolution — citing smith2024 for the theoretical motivation and jones2023 for empirical evidence of the trade-off in this regime.",
  "sources": ["smith2024", "jones2023"],
  "claims": ["C2"],
  "feedback_ref": null,
  "diff_summary": null,
  "iteration": 0
}

Required fields: ts, stage, agent, action, target, reasoning

Conditional fields:

sources — BibTeX keys informing this action (required for write, add, expand)
claims — claim IDs from the claims matrix this action supports
feedback_ref — pointer to the review feedback that triggered this action (required for revise and cut during QA)
diff_summary — one-line description of what changed (required for revise, cut, expand)
archived_to — path where cut content is saved (required for cut)
iteration — 0 for the initial pipeline, 1+ for /auto iterations

Actions: write, revise, cut, add, expand, reorder, research, plan

Paragraph targeting: [section]/p[N] (e.g., introduction/p1, methods/p5). For subsections: methods/training-procedure/p2. For splits: methods/p3a and methods/p3b.

Cut archiving

Whenever content is removed from main.tex, it is first saved to provenance/cuts/[section]-[pN]-[context].tex. The provenance entry for the cut records the archived_to path. Content is never deleted without archiving.

The `/provenance` query command

Query the provenance ledger interactively:

Query form	What it returns
`/provenance`	Full summary: total actions by type, coverage (which paragraphs are traced), source utilization
`/provenance methods`	Provenance for every paragraph in the Methods section: who wrote it, sources, reasoning, revision history
`/provenance trace C3`	Full chain for claim C3: which paragraphs support it, what sources provide evidence, writing reasoning, any revisions
`/provenance history methods/p3`	Complete history of one paragraph: original writing, every revision with feedback reference
`/provenance sources smith2024`	Every place smith2024 was used: which paragraphs, what content, whether for claims, background, or methodology
`/provenance gaps`	Paragraphs with no provenance entry, claims with no linked provenance, sources never referenced in provenance
`/provenance timeline`	Chronological view of all actions; add a stage name to filter

The provenance report

At the end of Stage 6 and after each /auto run, research/provenance_report.md is generated as a human-readable summary of the entire provenance ledger, organized by section with paragraph-level histories, a cuts archive section, and a list of any untraced content.

6. Knowledge Graph

An optional LightRAG-based knowledge graph built from source extracts. It enables semantic search across all sources, contradiction detection, and claim-level evidence queries.

Building the graph

export OPENROUTER_API_KEY=your-key
python scripts/knowledge.py build

Reads all research/sources/*.md files, extracts entities and relationships using Gemini Flash (via OpenRouter), and builds a queryable graph with Qwen3 8B semantic embeddings. Stored in research/knowledge/ (gitignored — rebuilds from sources).

The pipeline builds the graph automatically after Stage 1d if the prerequisites are met.

Query commands

python scripts/knowledge.py query "how do transformer architectures handle long sequences?"
python scripts/knowledge.py contradictions
python scripts/knowledge.py evidence-for "attention is more parameter-efficient than convolution"
python scripts/knowledge.py evidence-against "scaling laws hold at all model sizes"
python scripts/knowledge.py entities
python scripts/knowledge.py relationships "attention mechanism"

Command	Output
`query "question"`	Freeform semantic search; synthesized answer with source citations
`contradictions`	Conflicting claims across sources; saved to `research/knowledge_contradictions.md`
`evidence-for "claim"`	Sources and findings supporting the claim
`evidence-against "claim"`	Sources and findings challenging or contradicting the claim
`entities`	All extracted concepts, theories, methods, papers, authors grouped by type
`relationships "entity"`	How a concept connects to other entities in the graph

The /knowledge slash command wraps these operations interactively. If the graph does not exist, it offers to build it.

7. All 39 Slash Commands

Autonomous pipeline

Command	Description
`/write-paper <topic>`	Full autonomous pipeline: research → outline → writing → figures → QA → finalization. 1-4 hours standard, 3-8 hours deep.
`/auto [N]`	Run N improvement iterations on a completed paper (default 1). Use `--continue` to resume.
`/preview-pipeline`	Dry run of `/write-paper` — shows each stage, what will RUN vs SKIP, model selections, time estimate. Executes nothing.
`/status`	Progress dashboard: word counts per section vs targets, reference count, figure count, pipeline stage, LaTeX compilation status. Also writes `.paper-progress.txt`.

Research and references

Command	Description
`/search-literature <query>`	Find relevant papers for a topic using the domain-appropriate skill databases and tool fallback chain.
`/add-citation <DOI or title>`	Add a properly formatted BibTeX entry to `references.bib` after verifying the paper exists.
`/ingest-papers`	Import PDFs from `attachments/`, extract metadata and content snapshots, generate BibTeX entries, write source extracts.
`/cite-network`	Analyze citation patterns: distribution by section and year, temporal coverage, venue diversity, author diversity, orphan detection, gap identification with suggested additions.
`/ask <question>`	Query research artifacts to answer questions. Searches `research/sources/`, `research/`, `reviews/`, `main.tex`, `references.bib`, and `research/log.md`, providing the provenance trail for each answer.
`/knowledge [operation]`	Interact with the LightRAG knowledge graph: query, contradictions, evidence-for, evidence-against, entities, relationships. Builds the graph if needed.
`/export-sources`	Export source extracts and references to the shared knowledge base (`~/.research-agent/shared-sources/`) for reuse across papers.
`/import-sources [topic]`	Import relevant sources from the shared knowledge base into the current paper. Uses `.paper.json` topic if omitted.
`/audit-sources`	Retroactive source coverage audit: classifies all references by access level, attempts OA resolution for abstract-only sources, generates acquisition list. Standalone version of Stage 1d.

Writing

Command	Description
`/init <topic>`	Quick single-pass paper generation without the full multi-stage pipeline. Useful for drafts or shorter documents.
`/outline <section>`	Generate a structured outline for a specific section with key arguments, planned citations, and subsection organization.
`/revise-section <section>`	Rewrite a section based on provided feedback, maintaining consistency with the rest of the paper.

Quality and build

Command	Description
`/review`	Comprehensive manuscript quality review covering technical soundness, writing quality, and citation completeness.
`/check-consistency`	Find and fix notation inconsistencies, terminology drift, abbreviations used before definition or defined twice, and reference format inconsistencies.
`/audit-claims`	Flag overclaims — "novel"/"first" without evidence, "significantly" without statistical tests, "prove" based only on experiments, unsupported factual claims.
`/check-citations`	Verify every citation via CrossRef API or search, fix metadata mismatches, remove fabricated entries, attempt OA resolution for newly verified papers.
`/novelty-check [contribution]`	Verify the paper's contribution hasn't been published. Uses multiple databases plus Codex cross-model verification. Returns NOVEL, PARTIALLY NOVEL, or NOT NOVEL.
`/de-ai-polish [section]`	Remove AI writing patterns across 7 categories: filler phrases, AI vocabulary, formulaic transitions, redundant phrasing, empty emphasis, em dashes, structural tells.
`/reproducibility-checklist`	Check Methods completeness against a structured checklist (general scientific, ML-specific if applicable, ethical considerations). Reports YES/NO/N/A per item with section references.
`/codex-review [section]`	On-demand adversarial review from OpenAI Codex via `codex_plan`, `codex_review`, and `codex_ask`. Requires codex-bridge.
`/codex-telemetry [export]`	Analyze Codex interaction patterns: agreement rates, tool usage breakdown, disagreement hotspots, timeline. Use `export` to write report to file.
`/health`	Diagnose pipeline prerequisites and optional integrations (LaTeX, API keys, knowledge graph, Codex, Praxis). Reports status, detail, and impact for each check.
`/compile`	Compile LaTeX to PDF via `latexmk -pdf -interaction=nonstopmode main.tex` and report errors.

Analysis

Command	Description
`/analyze-data <file>`	Statistical analysis on datasets in `attachments/`, generate publication figures via matplotlib or Praxis.
`/praxis-analyze [file or technique]`	Technique-specific analysis via Praxis (auto-detects from data): XRD, DSC, TGA, FTIR, Raman, XPS, EIS, mechanical testing, VSM, UV-Vis, BET, and more. Venue-matched journal figure styles.

Provenance

Command	Description
`/provenance [mode]`	Query the provenance ledger. Modes: summary (default), section name, `trace <claim-id>`, `history <target>`, `sources <bibtex-key>`, `gaps`, `timeline [stage]`.

Output and submission

Command	Description
`/lay-summary`	Generate a 200+ word plain-language summary, a 2-3 sentence elevator pitch, and (where required by venue) a lay summary for inclusion in the manuscript.
`/archive`	Bundle all research artifacts into a browsable `archive/` directory with a README index. Auto-runs at the end of `/write-paper`.
`/prepare-submission`	Generate submission package: anonymized version (for blind-review venues), camera-ready version, cover letter, response to reviewers (if reviews exist), and a submission checklist.
`/respond-to-reviewers`	Generate a structured point-by-point response to peer reviewer comments with tracked changes in the manuscript.
`/make-slides`	Generate a presentation slide deck from the paper (structured markdown with speaker notes, calibrated to venue talk length).
`/make-blog-post`	Generate a 1500-3000 word blog post explaining the paper for a general technical audience.
`/make-deliverables`	Generate all deliverables in parallel: lay summary, slide deck, and blog post (3 simultaneous agents).
`/prisma-flowchart`	Generate a PRISMA 2020 flowchart from the research log and add it to the manuscript.
`/clean`	`latexmk -c` to remove LaTeX build artifacts. Add `all` argument to also remove `research/`, `reviews/`, `archive/`, and pipeline state files (never removes `main.tex`, `references.bib`, `figures/`, `attachments/`, `.paper.json`, `.venue.json`).

8. Integrations

Codex Bridge (optional adversarial AI review)

Install with npm i -g codex-bridge. When present, create-paper auto-configures it via codex-bridge init. All integration is graceful — if not installed, every step that uses it is silently skipped.

Codex (OpenAI) contributes at 10 points in the pipeline:

Pipeline point	What Codex does
Stage 1 (after research agents)	Independent literature contribution — papers Claude may have missed
Stage 1c	Cross-checks every research file for inaccurate representations or missing nuances
Stage 2 (claims-evidence matrix)	Challenges whether evidence actually supports each claim
Stage 2b	Stress-tests the contribution statement and argument structure
Stage 3 (Limitations)	Drafts the Limitations subsection from an adversarial perspective
Stage 3 (each section)	Quick spot-check after each writing agent; in deep mode, also identifies content gaps
Stage 4c	Audits whether figures and surrounding text claims match
Stage 5	Adversarial peer review as a 4th parallel reviewer
Post-QA	Independent reference verification of a random sample
Post-QA	Risk radar assessment across 5 dimensions
Stage 6	Collaboration statistics report

Deliberation protocol: Codex feedback is never blindly accepted. For every Codex interaction, Claude evaluates each point as AGREE, PARTIALLY AGREE, or DISAGREE, with explicit reasoning. On DISAGREE, Claude sends a rebuttal with specific counterarguments. Codex gets one response. If still unresolved, both perspectives are logged in reviews/codex_deliberation_log.md for the user to decide. Neither side silently wins.

Praxis (domain-specific scientific data analysis)

Auto-cloned as a git submodule at vendor/praxis/ by create-paper. Provides:

50+ characterisation techniques: XRD, DSC, TGA, FTIR, Raman, XPS, EIS, mechanical testing (Young's modulus, UTS), AFM, SEM, VSM, BET, UV-Vis, and more
9 journal figure styles: Nature, Science, ACS, Elsevier, Wiley, RSC, Springer, IEEE, MDPI — auto-matched to venue from .venue.json
Colourblind-safe palettes by default (Okabe-Ito)
Technique-aware quantitative outputs: crystallite size from Scherrer equation (XRD), Tg/Tm from DSC, Young's modulus from mechanical curves, coercivity from VSM, etc.

The pipeline auto-detects data files in attachments/ and uses Praxis at Stage 4. Analysis scripts are saved to figures/scripts/ for reproducibility. Install dependencies with pip install -r vendor/praxis/requirements.txt.

Claude Scientific Skills (177 skills)

Cloned as a git submodule at vendor/claude-scientific-skills/ and symlinked to .claude/skills/. Skills are markdown files that agents read to guide their behavior. Used throughout the pipeline by naming them in agent prompts (e.g., "invoke the scientific-writing skill").

Skill categories include:

Writing and review: scientific-writing, peer-review, scientific-critical-thinking, citation-management
Database access: pubmed-database, arxiv-database, openalex-database, chembl-database, uniprot-database, clinicaltrials-database, and 30+ more
Analysis: statistical-analysis, exploratory-data-analysis, scikit-learn, transformers, pytorch-lightning
Visualization: matplotlib, seaborn, plotly, scientific-visualization
Domain tools: rdkit, biopython, scanpy, pymatgen, qiskit, and 100+ more

Domain detection and skill routing

At the start of Stage 1, the pipeline analyzes the topic to detect its domain. Domain detection matches indicator keywords:

Domain	Indicators	Priority databases
Biomedical / Life Sciences	gene, protein, cell, disease, clinical, drug, genomic, cancer	PubMed, bioRxiv, UniProt, KEGG, Reactome, PDB
Chemistry / Drug Discovery	molecule, compound, synthesis, binding, SMILES, ADMET	PubChem, ChEMBL, DrugBank, ZINC, BindingDB
Computer Science / AI/ML	neural, network, model, training, transformer, LLM, NLP	arXiv, HuggingFace
Physics / Quantum	quantum, particle, field, relativity, optics	arXiv, NIST
Materials Science	crystal, material, alloy, polymer, semiconductor	arXiv, Materials Project
Ecology / Geospatial	ecology, climate, satellite, geographic, biodiversity	GBIF, geospatial databases
Economics / Finance	market, economic, stock, GDP	FRED, Alpha Vantage, EDGAR
Clinical / Medical	patient, treatment, trial, diagnosis, hospital	ClinicalTrials.gov, FDA databases
General (default)	(all others)	arXiv, OpenAlex

All domains also receive universal skills: scientific-writing, citation-management, peer-review, statistical-analysis, scientific-visualization, matplotlib, seaborn, plotly.

9. Writing Rules

These rules are enforced across all agents in the pipeline. Violations are caught in QA review.

Always write in full paragraphs. Never leave bullet points in the final manuscript. Bullet points in agent outlines are converted to flowing prose.
Two-stage writing process. Stage 1: research and outline. Stage 2: convert outlines into flowing academic prose with transitions.
IMRAD structure. Introduction, Methods/Approach, Results/Experiments, Discussion. Adapted for survey and theoretical paper types.
Citations. Use \citep{key} for parenthetical and \citet{key} for narrative citations. Keys are in firstauthorlastnameYear format.
Figures. Save to figures/, reference with \includegraphics{figures/filename}. Prefer vector formats (PDF, EPS). PNG only for photographs.
Tables. Use booktabs package (\toprule, \midrule, \bottomrule). Minimum 2 tables.
Cross-references. Use \label{} and \ref{} consistently. All \ref{} calls must resolve.
No placeholder text. Remove all \lipsum, TODO, TBD, FIXME before finalizing.
No fabricated references. Every BibTeX entry must be a real, verifiable publication. The reference validation stage enforces this; fabricated entries are removed.
Claims-Evidence Matrix. Every major claim must map to specific evidence (experiment, citation, or proof) in research/claims_matrix.md. Every claim must reach "Supported" status to pass QA.
No em dashes. Never use em dashes (—) or en dashes (–) as punctuation. Rewrite using commas, parentheses, colons, or separate sentences. Em dashes are the single most recognizable AI writing pattern.
Provenance logging. Every agent that writes, revises, or cuts manuscript content appends entries to research/provenance.jsonl. Venue-aware length: if .venue.json has a page limit, scale section word targets proportionally.

10. Model Tiers

Three model tiers with 1M context windows are used throughout the pipeline. The [1m] suffix is required — the shorthand "opus" and "sonnet" resolve to standard context models, not the 1M variants.

Tier	Model ID	Used for
Opus 1M	`claude-opus-4-6[1m]`	Writing agents, revision agents, expansion agents, gap analysis, de-AI polish, final polish — anything requiring deep reasoning, synthesis, or prose quality
Sonnet 1M	`claude-sonnet-4-6[1m]`	Research agents, review agents, data analysis, figures, assessment agents in `/auto`, verification agents — tasks requiring tool use, search, and structured evaluation
Haiku	`haiku`	Bibliography building, reference validation, lay summary, reproducibility checklist, per-section literature searches in deep mode, provenance report generation — mechanical lookup and formatting tasks

The 1M context windows allow agents to read entire manuscripts, all research files, full bibliographies, and all review feedback without hitting context limits.

11. Project Structure

research-agent/                 (this repository)
├── create-paper                Bash script: stamps out new paper projects
├── write-paper                 Bash script: launcher for the autonomous pipeline
├── sync-papers                 Bash script: migrate/update existing projects to use symlinks
├── template/
│   ├── claude/
│   │   ├── CLAUDE.md          Project instructions, writing rules, command reference
│   │   ├── settings.local.json Tool permissions for autonomous operation
│   │   ├── commands/          All 35 slash commands (symlinked into each paper project)
│   │   └── pipeline/          Stage-specific instructions (read on-demand per stage/phase)
│   │       ├── stage-1-research.md through stage-6-finalization.md
│   │       ├── auto-phase-1-assessment.md through auto-phase-4-verification.md
│   │       └── shared-protocols.md
│   ├── scripts/               Utility scripts (knowledge.py, etc.)
│   ├── venues/                Venue configuration JSON files
│   │   ├── generic.json
│   │   ├── ieee.json
│   │   ├── acm.json
│   │   ├── neurips.json
│   │   ├── nature.json
│   │   ├── arxiv.json
│   │   └── apa.json
│   ├── main.tex               LaTeX template (overwritten by venue-specific generation)
│   ├── references.bib         Empty bibliography template
│   └── gitignore              Standard gitignore for paper projects
├── tests/                     Test suite (run_all.sh, test_structure.sh, test_prompts.sh, test_schema.py)
├── .github/workflows/ci.yml  CI via GitHub Actions
└── vendor/                    External dependencies (submodules)

Generated paper project structure:

my-paper/
├── main.tex                   LaTeX document, venue-formatted
├── references.bib             BibTeX bibliography
├── .paper.json                Paper topic, venue, authors, depth, config
├── .venue.json                Venue formatting rules (copied from template)
├── .paper-state.json          Pipeline checkpoint state (created at runtime)
├── .paper-progress.txt        Human-readable progress monitor (tail from another terminal)
├── figures/                   Generated figures (PDF, PNG)
│   └── scripts/               Figure generation scripts (for reproducibility)
├── attachments/               User-provided PDFs, datasets, supplementary materials
├── research/                  Literature research outputs (created by pipeline)
│   ├── sources/               Raw source extracts per cited paper (<bibtexkey>.md)
│   ├── knowledge/             LightRAG knowledge graph (gitignored, rebuilds from sources)
│   ├── log.md                 Complete research provenance log
│   ├── provenance.jsonl       Machine-readable provenance ledger (append-only)
│   ├── provenance_report.md   Human-readable provenance summary (generated at end)
│   ├── survey.md              Field survey (Stage 1, Agent 1)
│   ├── methods.md             Methodology deep dive (Stage 1, Agent 2)
│   ├── empirical.md           Empirical evidence (Stage 1, Agent 3)
│   ├── theory.md              Theoretical foundations (Stage 1, Agent 4)
│   ├── gaps.md                Gap analysis and thesis proposal (Stage 1, Agent 5)
│   ├── thesis.md              Thesis and contribution statement (Stage 2)
│   ├── claims_matrix.md       Claims-evidence matrix (Stage 2)
│   ├── novelty_check.md       Novelty verification report (Stage 2d)
│   ├── source_coverage.md     Source access level audit (Stage 1d)
│   ├── codex_cross_check.md   Codex research cross-check (Stage 1c)
│   ├── codex_thesis_review.md Codex thesis stress-test (Stage 2b)
│   ├── reference_validation.md Reference verification report (Post-QA)
│   ├── reproducibility_checklist.md Reproducibility checklist (Post-QA)
│   └── summaries.md           Lay summary and elevator pitch (Stage 6)
├── reviews/                   Review feedback (created during QA)
│   ├── technical.md           Technical reviewer output
│   ├── writing.md             Writing quality reviewer output
│   ├── completeness.md        Citation and completeness reviewer output
│   ├── codex_adversarial.md   Codex adversarial review
│   ├── codex_risk_radar.md    Codex risk radar assessment
│   ├── consistency.md         Post-QA consistency checker output
│   ├── claims_audit.md        Post-QA claims auditor output
│   └── codex_deliberation_log.md All Claude-Codex deliberations
├── provenance/
│   └── cuts/                  Archived text from all content that was cut
├── archive/                   Browsable research archive (created by /archive)
│   └── README.md              Indexed guide to all archive contents
├── submission/                Submission package (created by /prepare-submission)
├── vendor/
│   ├── claude-scientific-skills/ 177 scientific skills (git submodule)
│   └── praxis/               Scientific data analysis toolkit (git submodule)
├── .claude/                   Claude runtime scaffold (Claude projects)
│   ├── CLAUDE.md
│   ├── settings.local.json
│   ├── commands/
│   ├── pipeline/
│   └── skills/ -> vendor/
└── .codex/                    Codex runtime scaffold (Codex projects)
    ├── AGENTS.md
    ├── commands/
    ├── pipeline/
    └── skills/ -> vendor/

12. Configuration Reference

`.paper.json`

Created by create-paper, read by the pipeline and most commands.

{
  "topic": "A survey on large language model reasoning",
  "venue": "arxiv",
  "depth": "standard",
  "runtime": "claude",
  "model": "claude-opus-4-6",
  "max_revisions": 3,
  "email": "[email protected]",
  "oa_resolution": {
    "unpaywall": true,
    "openalex": true,
    "semantic_scholar": true,
    "core": true,
    "pubmed_central": "auto",
    "web_search": true,
    "repository_search": true
  },
  "authors": [
    {
      "name": "First Author",
      "affiliation": "Department, University",
      "email": "[email protected]",
      "orcid": "0000-0000-0000-0000"
    }
  ],
  "keywords": ["large language models", "reasoning", "chain-of-thought"],
  "funding": "Grant XYZ from Funding Body",
  "conflicts": "None declared",
  "data_availability": "Available at https://github.com/...",
  "code_availability": "https://github.com/..."
}

Field	Description
`topic`	Paper topic, used to seed the research pipeline and all agent prompts
`venue`	Target venue: `generic`, `ieee`, `acm`, `neurips`, `nature`, `arxiv`, `apa`
`depth`	`"standard"` (5 agents, 30-50 refs, 1-4 hrs) or `"deep"` (12 agents, 60-80 refs, 3-8 hrs)
`runtime`	Active harness: `claude` or `codex`
`model`	Base model (1M context variants are added automatically per tier)
`max_revisions`	Maximum QA iterations (overridden by depth: 5 for standard, 8 for deep)
`email`	Email for Unpaywall API auth and OpenAlex rate-limit boost. Also read from `UNPAYWALL_EMAIL` env var.
`oa_resolution`	Per-API toggles for the OA resolution chain (see below). All default to `true`.
`authors`	Author list for cover letter and author block
`keywords`	Keywords for submission metadata
`funding`	Funding acknowledgment text
`conflicts`	Conflicts of interest declaration
`data_availability`	Data availability statement for submission
`code_availability`	Code availability statement and URL

`oa_resolution` sub-object

Controls which APIs are tried during source acquisition (Stage 1d), /audit-sources, and /check-citations. The pipeline tries each enabled API in order and stops on the first successful PDF download.

Key	Default	Description
`unpaywall`	`true`	Unpaywall — ~30M OA articles. Requires `email` field or `UNPAYWALL_EMAIL` env var.
`openalex`	`true`	OpenAlex — 250M+ works, no key needed. Also extracts abstracts.
`semantic_scholar`	`true`	Semantic Scholar — checks `openAccessPdf` field.
`core`	`true`	CORE — 200M+ institutional repository papers. Requires `CORE_API_KEY` env var (free).
`pubmed_central`	`"auto"`	PubMed Central — biomedical full text. `"auto"` enables only for biomedical/clinical domains. Set `true` to force on, `false` to disable. Optional: `NCBI_API_KEY` env var increases rate limit.
`web_search`	`true`	Firecrawl search for PDFs (filetype:pdf).
`repository_search`	`true`	Search ResearchGate, Academia.edu, SSRN.

Environment variables

Variable	Required	Description
`UNPAYWALL_EMAIL`	For Unpaywall	Alternative to `email` in `.paper.json`. Any valid email address.
`CORE_API_KEY`	For CORE	Free API key from core.ac.uk/services/api. CORE is skipped if not set.
`NCBI_API_KEY`	No	Free key from NCBI. Increases PubMed rate limit from 3/s to 10/s.
`OPENROUTER_API_KEY`	For knowledge graph	Required for LightRAG knowledge graph. Graph is skipped if not set.

`.venue.json`

Copied from the template venues directory. Read by the pipeline, writing agents, and Praxis for figure styles.

{
  "name": "Generic Journal",
  "id": "generic",
  "documentclass": "\\documentclass[12pt]{article}",
  "packages": ["\\usepackage{natbib}", "..."],
  "bibliography_style": "plainnat",
  "citation_style": "natbib",
  "citation_commands": ["\\citep{}", "\\citet{}"],
  "page_limit": null,
  "abstract_word_limit": 300,
  "blind_review": false,
  "sections": ["Introduction", "Related Work", "Methods", "Results", "Discussion", "Conclusion"],
  "notes": "Standard journal format with natbib citations. No page limit."
}

The pipeline reads page_limit to scale section word targets proportionally. It reads blind_review to know whether /prepare-submission should anonymize the author block. It reads sections to determine the initial section order in main.tex.

`.paper-state.json`

Written after every stage. Read at startup for resume. Do not edit manually.

{
  "topic": "...",
  "venue": "generic",
  "started_at": "2026-03-24T10:00:00Z",
  "current_stage": "writing",
  "stages": {
    "research":     { "done": true,  "completed_at": "...", "notes": "45 refs found" },
    "codex_cross_check": { "done": true, "completed_at": "..." },
    "source_acquisition": { "done": true, "full_text": 28, "abstract_only": 12, "metadata_only": 5 },
    "knowledge_graph": { "done": true, "entities": 347, "relationships": 891 },
    "outline":      { "done": true,  "completed_at": "..." },
    "codex_thesis": { "done": true,  "completed_at": "..." },
    "novelty_check": { "done": true, "status": "NOVEL" },
    "writing": {
      "done": false,
      "sections": {
        "introduction":  { "done": true,  "words": 1250 },
        "related_work":  { "done": true,  "words": 2100 },
        "methods":       { "done": false, "words": 0 }
      }
    },
    "figures":      { "done": false },
    "qa":           { "done": false },
    "qa_iteration": 2,
    "codex_risk_radar": { "done": false },
    "finalization": { "done": false },
    "auto_iterations": {
      "completed": 0,
      "requested": 0,
      "history": []
    }
  }
}

`.paper-progress.txt`

Human-readable progress file updated at each stage. Intended for monitoring from a second terminal:

watch cat .paper-progress.txt

Updated by /status and at each pipeline checkpoint.

Contributing

The template files in template/claude/commands/ define what each slash command does. Editing them changes the behavior of the pipeline in all new paper projects.

Key files for contributors:

template/claude/CLAUDE.md — project rules and command reference (symlinked into each paper project)
template/claude/commands/write-paper.md — the full pipeline definition (~1500 lines)
template/claude/commands/auto.md — the /auto improvement loop
template/claude/commands/provenance.md — the provenance query command
create-paper — the project scaffolding script
write-paper — the pipeline launcher script
template/venues/ — venue configuration files

Testing

Run the test suite with tests/run_all.sh. Individual tests: tests/test_structure.sh (project structure validation), tests/test_prompts.sh (prompt consistency checks), tests/test_schema.py (JSON schema validation). CI runs automatically via GitHub Actions (.github/workflows/ci.yml).

License

MIT

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing

Research Agent

Research Agent

Table of Contents

1. What It Is

Value proposition

2. Getting Started

Prerequisites

Installation

How updates work

Creating a new paper project

Launching the pipeline

Monitoring a running pipeline

Available venues

3. The Full Pipeline: /write-paper

Checkpoint and resume

Stage 1: Deep Literature Research

Stage 1c: Codex Research Cross-Check

Stage 1d: Source Coverage Audit and Acquisition

Knowledge Graph Build

Stage 2: Thesis, Contribution, and Outline

Stage 2b: Codex Thesis Stress-Test

Stage 2c: Targeted Second Research Pass (deep mode only)

Stage 2d: Novelty Verification

Stage 3: Section-by-Section Writing

Stage 4: Figures, Tables, and Visual Elements

Stage 5: Quality Assurance Loop

Post-QA: Consistency and Claims Audit

Post-QA: Reference Validation

Post-QA: Codex Risk Radar

Stage 6: Finalization

4. The /auto Command

Phase 1: Assessment (4 parallel agents)

Phase 2: Prioritize

Phase 3: Execute

Phase 4: Verify

Early stop on diminishing returns

Rules

5. Provenance System

The provenance ledger

Cut archiving

The /provenance query command

The provenance report

6. Knowledge Graph

Building the graph

Query commands

7. All 39 Slash Commands

Autonomous pipeline

Research and references

Writing

Quality and build

Analysis

Provenance

Output and submission

8. Integrations

Codex Bridge (optional adversarial AI review)

Praxis (domain-specific scientific data analysis)

Claude Scientific Skills (177 skills)

Domain detection and skill routing

9. Writing Rules

10. Model Tiers

11. Project Structure

12. Configuration Reference

.paper.json

oa_resolution sub-object

Environment variables

.venue.json

.paper-state.json

.paper-progress.txt

Contributing

Testing

License

Top categories

3. The Full Pipeline: `/write-paper`

4. The `/auto` Command

The `/provenance` query command

`.paper.json`

`oa_resolution` sub-object

`.venue.json`

`.paper-state.json`

`.paper-progress.txt`