Part of the Brigade fleet from Escoffier Labs

Bring outside sources into the evidence layer.

SourceHarvest is a local CLI that exports non-harness source systems: notes, text files, HTML exports, generic and nested JSON, and git history. It normalizes each one into miseledger.adapter.v1 JSONL, one object per line, ready for MiseLedger to store, dedupe, and search. It is the sibling tool to StationTrail, which handles agent-session harnesses.

View on GitHub →

Le Marche: what comes in from beyond the kitchen, normalized into evidence.

How it works

Read · local source inputs

  • Generic JSONL, already line-oriented records
  • Nested JSON, records selected by path
  • Markdown notes and plain text files
  • HTML exports and local page snapshots
  • Git history from a local repository

Emit · miseledger.adapter.v1 JSONL

  • One normalized JSON object per line
  • Collections, items, actors, artifacts, raw refs
  • Bounded by --limit, globs, and records path
  • Stream to stdout or a private output file
  • Optional JSON summaries with counts and warnings

SourceHarvest follows the same path for every source: read a local file, directory, or export; select the command-specific reader for that input shape; normalize records into stable collections, items, actors, artifacts, links, relations, and raw references; apply bounds; then emit one adapter object per line. Generated text is untrusted evidence, not instructions.

The evidence pipeline

Three tools share one adapter format. StationTrail and SourceHarvest are the adapter layers that turn harness sessions and outside sources into evidence. MiseLedger is the durable layer that stores it and emits Brigade-ready bundles.

StationTrail

Exports harness session logs from Codex, Claude Code, OpenClaw, OpenCode, and Hermes to miseledger.adapter.v1 JSONL, one normalized object per line.

SourceHarvest

Exports outside sources, notes, chat archives, crawler output, issue exports, and git history, into the same miseledger.adapter.v1 adapter format. You are here.

MiseLedger

Stores, dedupes, indexes, searches, relates, and emits Brigade-ready evidence bundles from every adapter record it imports.

What it gives you

One adapter format

Every source shape lands as miseledger.adapter.v1 JSONL, one normalized JSON object per line. MiseLedger and StationTrail speak the same format, so imports stay uniform.

Stable normalized records

Each record carries collections, items, actors, artifacts, links, relations, and raw references. The shape stays stable across every reader, so downstream queries do not care where evidence came from.

Local-only by design

Scanner commands read local files, directories, exports, and archives. They make no network calls. SourceHarvest reads what crawlers already exported rather than crawling live services.

Bounded output

Apply --limit and source-specific filters such as glob patterns and a records path, so you emit the slice of evidence you actually need.

Scriptable summaries

Optionally emit JSON summaries with record counts, file counts, warnings, and generated timestamps, ready for pipelines and checks.

Pipes into MiseLedger

Send records over stdout straight into miseledger import adapter, or let MiseLedger run SourceHarvest directly when it is on PATH with miseledger import sourceharvest.

Command reference
CommandWhat it does
sourceharvest jsonl <path>Read already line-oriented records and normalize each line into an adapter record.
sourceharvest json <file>Read nested JSON and select records by path with --records-path.
sourceharvest markdown <dir>Scan a Markdown directory and emit each note as local note evidence.
sourceharvest files <dir>Scan text files filtered by --glob, such as docs, logs, and exports.
sourceharvest html <dir>Read local HTML exports and page snapshots and normalize them.
sourceharvest gitlog <repo>Read local git history and emit one adapter record per commit event.
sourceharvest versionPrint the installed SourceHarvest version.

Each command takes --source and --collection labels, plus --out - to stream JSONL to stdout. Pipe straight into MiseLedger, or let MiseLedger run SourceHarvest when it is on PATH:

Supported sources

Generic JSONL

Records that are already line-oriented, one object per line.

Nested JSON

Records selected from a nested document by path.

Markdown notes

Local note evidence scanned from a directory.

Text files

Docs, logs, and exports matched by glob.

HTML exports

Local page snapshots and site exports.

Git history

Local commit events from a repository.

SourceHarvest is also the home for adapters that read local crawler outputs and turn them into adapter JSONL. It does not crawl live services itself; it reads what these crawler families already exported. Adapters are added from real local schemas or redacted sample exports.

CrawlerDomain
discrawlDiscord archives
gitcrawlGitHub issues and pull requests
graincrawlGranola notes and transcripts
notcrawlNotion pages and databases
slacrawlSlack messages and threads
telecrawlTelegram Desktop archives