/gaia-feed

user-facing

Category:: Sprint Management
Lifecycle phase:: 4 -- Implementation
Arguments:: <url-or-path | - for stdin> [--slug SLUG] [--tags TAG1,TAG2] [--ttl DAYS] [--kind url|file|llms_txt|stdin]

What it does

/gaia-feed ingests an external document into the brain knowledge layer in a single gesture. Hand it a URL, a local file path, or pipe content via stdin, and it writes a provenance-stamped markdown file under .gaia/knowledge/ingested/ and registers an ingested entry in brain-index.yaml.

The pipeline runs five stages: classify source, fetch content, strip HTML (for URLs), write the ingested file with provenance frontmatter, and register the brain-index entry. Both writes are atomic -- a temporary file is written first and renamed into place only on success.

For URL sources, the ingested file contains the WebFetch-rendered markdown of the page -- clean text with script, style, and head content stripped out -- not the byte-raw HTML.

When to use it

You want to pull external reference material (API docs, library guides, specifications) into your queryable knowledge layer.
You want provenance tracking (source URL, fetch timestamp, content hash, expiry) on ingested documents.
You want the ingested document to appear alongside project artifacts when querying the Brain via /gaia-brain-query.
You want to update a previously ingested document to a new version -- re-run the command with the same --slug to overwrite it cleanly.

Prerequisites

No strict prerequisites. The ingested directory and brain index are created automatically if they do not exist. A seed brain-index.yaml must be present (run /gaia-brain-reindex first if the knowledge store has not been initialized).

How to invoke

/gaia-feed https://example.com/api-docs         # ingest a URL
/gaia-feed ./specs/openapi.yaml                  # ingest a local file
/gaia-feed -                                     # read from stdin (paste content)
/gaia-feed --slug my-api-docs https://example.com/docs   # explicit slug
/gaia-feed --ttl 60 https://example.com/docs     # 60-day time-to-live
/gaia-feed --tags api,reference ./specs/api.md   # explicit tags
/gaia-feed --kind llms_txt https://example.com   # force llms_txt source kind

Flags and options

Flag	Default	Description
`--slug SLUG`	Auto-inferred from URL hostname or filename	URL-safe identifier for the ingested file. Determines the filename (`<slug>.md`) and the brain-index entry key. If you need two different versions of the same source to coexist, give each a distinct slug.
`--tags TAG1,TAG2`	Auto-inferred from source kind and domain signals	Comma-separated list of tags stored in the provenance frontmatter. Tags help filter results when querying the brain.
`--ttl DAYS`	30	Time-to-live in days. The ingested file's `expires_at` is set to `fetched_at + ttl_days`. After the TTL elapses without a successful refresh, the entry is marked `stale`.
`--kind url\|file\|llms_txt\|stdin`	Auto-detected from the source argument	Overrides the auto-detected source kind. Normally set automatically by the orchestration layer (e.g., `llms_txt` when the llms-full.txt probe succeeds). You rarely need to set this manually.

Source kinds

The pipeline classifies every ingestion into one of four source kinds, which determines the fetch method and the confidence score assigned to the brain-index entry:

Source kind	Trigger	Fetch method	Confidence
`url`	An `http://` or `https://` URL	WebFetch (orchestration layer)	0.7
`llms_txt`	A URL where the `llms-full.txt` probe succeeds	WebFetch for the `llms-full.txt` endpoint	0.9
`file`	Path to an existing local file	Direct read	0.8
`stdin`	`-` as the source argument	Read from stdin	0.8

llms-full.txt probe

When the source is a URL, the pipeline first probes for a conventional llms-full.txt endpoint at the base of the URL. If the probe returns non-empty content, the pipeline ingests that content directly (with source kind llms_txt and the higher 0.9 confidence tier) instead of fetching and stripping the original page. This provides cleaner, LLM-optimized content when the site publishes it.

What it does step by step

Classify the source Determines whether the input is a URL, a local file, or stdin. For URLs, probes for a conventional llms-full.txt endpoint and uses it when available (cleaner, LLM-optimized content).
Fetch the content Reads the file directly, reads stdin, or (for URLs) delegates to WebFetch in the orchestration layer. A 30-second fetch timeout and 10 MB size cap are enforced.
Strip HTML For URL sources, removes HTML tags, decodes common entities, and strips script/style/head content to produce clean markdown. File and stdin sources pass through unchanged.
Write the ingested file Writes the content under .gaia/knowledge/ingested/<slug>.md with exactly 11 provenance frontmatter fields. The write is atomic via a sibling temporary file and rename. If a file with the same slug already exists, it is replaced.
Register the brain-index entry Appends (or replaces) an ingested entry in brain-index.yaml with a populated trust block carrying the content hash, source URL, timestamps, and a confidence score tiered by source kind. The index is validated against its schema before the rename; on failure the prior index is preserved.

Same-slug overwrite behavior

Re-running /gaia-feed with the same --slug (or with a source that auto-infers the same slug) replaces the existing entry cleanly. Both the ingested file and the brain-index entry are overwritten atomically -- the old content is not duplicated or versioned.

This is the supported way to update an ingested source to a new version. The provenance frontmatter is refreshed with the new fetch timestamp, content hash, and expiry.

If you want a different version to coexist alongside the existing one (rather than replace it), use a different --slug for each version.

Provenance frontmatter

Every ingested file carries exactly 11 frontmatter fields:

Field	Type	Description
`title`	string	Document title, auto-inferred from the first heading or filename.
`slug`	string	URL-safe identifier (auto-inferred or explicit via `--slug`).
`ingest_source_kind`	enum	One of `url`, `file`, `llms_txt`, `stdin`.
`source_url`	string or null	Origin path or URL; null for stdin.
`fetched_at`	ISO 8601	UTC timestamp of the fetch.
`expires_at`	ISO 8601	`fetched_at` + `ttl_days`. After this time the entry is considered stale if it has not been successfully refreshed.
`content_hash`	string	sha256 of the post-strip markdown body.
`ttl_days`	integer	Time-to-live in days (default 30).
`token_estimate`	integer	Rough token count derived from word count.
`tags`	list	Auto-inferred tags (source kind, domain signals), or explicit via `--tags`.
`status`	enum	One of `current`, `stale`, `failed`. New ingestions start as `current`.

Security controls

The ingestion pipeline enforces three layers of protection:

SSRF pre-check. Before any network read, the safe-fetch guard resolves the host and rejects URLs pointing to private (RFC 1918), link-local, loopback, carrier-grade NAT (RFC 6598), or cloud-metadata addresses. Only http and https schemes are permitted.
Size cap and fetch timeout. Fetched content is capped at 10 MB; a 30-second fetch timeout prevents resource exhaustion.
Slug write-boundary containment. The slug is sanitised (path separators and traversal sequences are stripped) and a realpath containment check verifies the resolved write path is a child of .gaia/knowledge/ingested/ before any file is created.

Outputs

Output	Location	Description
Ingested file	`.gaia/knowledge/ingested/<slug>.md`	The ingested document with provenance frontmatter.
Brain index entry	`.gaia/knowledge/brain-index.yaml`	An `ingested` entry with a trust block carrying content hash, source URL, timestamps, and confidence.

What to run next

/gaia-brain-query -- query the brain to see the ingested document alongside project artifacts.
/gaia-knowledge-refresh -- re-fetch all ingested sources and update any that have changed upstream.
/gaia-brain-reindex -- the reindex sweep preserves ingested entries; run it any time to refresh project-artifact entries without losing ingested content.

Troubleshooting

The slug already exists

This is expected behavior. If a file with the same slug already exists, the pipeline overwrites it atomically. The brain-index entry is replaced with fresh provenance. See Same-slug overwrite behavior.

Brain-index validation failed

The pipeline validates the index before committing. On failure, the prior index is preserved. Check the error message for schema violations and ensure the index is well-formed.

URL fetch failed

URL fetching is delegated to WebFetch in the orchestration layer. Ensure the URL is reachable and returns content. Paywalled, SPA-rendered, and authenticated sources are out of scope.

How do I update an ingested source?

Re-run /gaia-feed with the same --slug. The existing entry is overwritten cleanly. See Same-slug overwrite behavior.

How do I remove an ingested source?

Use /gaia-unfeed <slug>. It deletes the ingested file and de-registers the index entry atomically.

Related commands

Command	Relationship
`/gaia-knowledge-refresh`	Re-fetches all ingested sources and updates any that changed upstream.
`/gaia-unfeed`	Removes an ingested document. The inverse of `/gaia-feed`.
`/gaia-brain-reindex`	Rebuilds the index from source. Preserves ingested entries.
`/gaia-brain-query`	Query the brain -- ingested documents appear alongside project artifacts.
`/gaia-brain-health`	Shows unlinked entries -- ingested documents typically appear as unlinked (no governance edges).

For a full explanation of the knowledge layer, see the GAIA Brain concept page.