research/SCHEMA.md

Research Pipeline Schema

Updated: 2026-04-20

Research Pipeline Schema

The research pipeline scans the open web, news, YouTube and academic sources every morning, evaluates findings against the book's current arguments, and emits prioritised kanban cards. Author triages the cards in /kanban; approved items travel as marked review blocks into the relevant chapter pages.

Data flow

wiki/concepts/*.md ──┐
wiki/chapters/*.md ──┼──► derive_topics.mjs ──► wiki/research/topics.json
                     │                                    │
                     │                                    ▼
                     │                        scan.mjs (Brave / Apify / arXiv / Semantic Scholar)
                     │                                    │
                     │                                    ▼
                     │                        wiki/research/raw/YYYY-MM-DD/{source}/*.json
                     │                                    │
                     │                                    ▼
                     │                          evaluate.mjs (OpenAI scoring + callouts)
                     │                                    │
                     │                                    ▼
                     │                       wiki/research/candidates/YYYY-MM-DD.json
                     │                                    │
                     │                                    ▼
                     │                            build_cards.mjs
                     │                                    │
                     │                                    ▼
                     └─────────────────────► wiki/queue/scouted-YYYY-MM-DD-*.md
                                                          │
                                                          ▼
                                                    /kanban (Next.js)
                                                          │
                                          (Send to chapter as draft for review)
                                                          │
                                                          ▼
                                  wiki/chapters/chapter-NN.md
                                  (## New Content for Review block)

Card frontmatter schema

Every kanban card sits in wiki/queue/ as a markdown file with this frontmatter. Existing manually-authored queue entries remain compatible; new fields default sensibly when absent.

---
id: "2026-04-20-arxiv-2404-01234"        # stable unique id
title: "Short human title"
type: queue_entry                        # always queue_entry for kanban cards
origin: scouted | ingested | manual      # how the card came to exist
column: scouted | reviewing | approved | drafting | merged | rejected
priority: high | medium | low
relevance_score: 0.0 - 1.0               # only present on scouted cards
chapter: 5                               # primary mapped chapter (1-20)
related_chapters: [7, 8]
related_concepts: [purpose-emerges-organically, identity-through-work]
source_type: web | news | youtube | arxiv | scholar | substack | medium
source_url: "https://..."
source_title: "..."
source_author: "..."
source_date: 2026-04-18
scouted_at: 2026-04-20T07:05:00+00:00
callout_why: "Why this matters (one paragraph)"
callout_where: "Where in the book it fits (one paragraph)"
callout_why_book: "Why it fits this book specifically (one paragraph)"
notes: []                                # free-text notes added via UI
merged_at: null                          # populated when sent to chapter
merged_into: null                        # chapter slug merged into
---

## Source excerpt

(verbatim relevant excerpt from the source — the evaluator pulls the most
load-bearing 200-500 words)

## Evaluator notes

(any additional notes from the evaluator: counterarguments to consider,
overlap with existing wiki pages, suggested integration approach)

Column semantics

ColumnMeaning
ScoutedJust arrived from the scanner. Author has not looked at it yet.
ReviewingAuthor has glanced at it; thinking about whether it earns a place.
ApprovedWorth using. Awaiting the author deciding how to use it.
DraftingSent to a chapter as a ## New Content for Review block. Awaiting authorial revision.
MergedAuthor has reworked the block into chapter prose and removed the review marker.
RejectedNot for this book. Stays in the queue as record (so the deduper does not re-surface).

Priority

The evaluator picks priority based on:

  • high — fills a known gap in the book or contradicts a current claim with strong evidence
  • medium — strengthens an existing argument with concrete evidence
  • low — interesting but adjacent; noted for completeness

Origin

  • scouted — produced by the daily scanner
  • ingested — produced by an ingestion pass over wiki/raw/
  • manual — author wrote the queue entry directly

Dedupe

wiki/research/seen.json holds a flat object keyed by canonical URL → first-seen timestamp. The scanner skips anything already in seen.json. Rejected cards stay in the queue, which keeps their URLs in the seen.json record indirectly via card scanning at startup.

Marked review block

When the kanban "Send to chapter as draft for review" action fires, this block gets appended to the relevant wiki/chapters/chapter-NN.md:

<!-- review:NEW source:https://example.com card:2026-04-20-arxiv-2404-01234 date:2026-04-20 -->
## New Content for Review — 2026-04-20

**Source:** [Title](https://example.com) — Author, 2026-04-18

**Why it matters:** ...

**Where it fits:** ...

**Source excerpt:**

> ...

> Note: this block has not yet been reworked into the author's voice, e-prime,
> or British English. Edit then remove this marker comment when complete.
<!-- /review:NEW card:2026-04-20-arxiv-2404-01234 -->

The card: ID lets the kanban API track which card produced which block, so removing the block can flip the card to merged.