Getting Started
The Cape Documentation Agent is a documentation intelligence layer for cape.io. It ingests documentation from multiple sources — Markdown files, OpenAPI specs, Confluence — into a single PostgreSQL vector store, and exposes that knowledge through several consumption surfaces.
Architecture
Sources Knowledge Base Surfaces
────────────────────── ──────────────────────── ────────────────────────
Markdown files ──► Documents + Chunks ──► Chat (embedded in Cape)
OpenAPI specs ──► Vector embeddings ──► REST API
Confluence pages ──► (PostgreSQL + pgvector) ──► MCP server (Claude/Cursor)
CI/CD pipelines ──► ──► FAQ page
Every surface reads from the same store. The difference between modes is how much context is retrieved, and how the output is structured.
Namespaces
All content is partitioned by namespace. Queries are always scoped — namespaces are never mixed unless explicitly requested.
| Namespace | Audience | Auth required |
|---|---|---|
user_docs | Cape external customers | No |
tech_docs | Internal development team | API key |
api_endpoints | Developers integrating with Cape | API key |
confluence | Internal teams via Confluence | API key |
Retrieval modes
The retrieval parameter controls how context is gathered before generating a response.
Standard — embeds the query, runs a top-K vector similarity search, passes results to the LLM in one call. Fast, one round-trip.
Smart — agentic multi-step process (2–4 LLM calls, max 3 iterations):
- Initial top-K search
- LLM evaluates whether context is sufficient; identifies gaps
- Inspects document outlines (heading paths, no content) to find targeted sections
- Fetches those sections; re-evaluates
- Generates final answer
Use smart retrieval for complex questions that span multiple document sections.
Chunking strategy
Documents are split hierarchically:
- Splits at H1/H2/H3 boundaries — each section becomes a parent chunk
- Parent chunks are further split into child chunks (max ~500 tokens, 100-token overlap)
- Code blocks are never split mid-block
- Each child chunk stores its heading breadcrumb (
headingPath) so context stays meaningful
Both parent and child chunks are stored. Embeddings are generated on child chunks for precision; retrieval returns parent content for context.
Language support
Language is BCP 47 (en, nl, de, fr). If omitted from a request, it is auto-detected from the user's input and passed to the LLM in the system prompt so responses are returned in the same language.