Substrate Net reads your code and AI-agent conversations and builds a grounded, queryable picture of what you know across every project — organized as a DIKW pyramid that ends in Wisdom: the competencies you've mastered, the insights your work reveals, and the gaps worth closing next.
Everything runs locally on SQLite. A tree-sitter + LLM hybrid keeps it cheap and honest: the parser resolves exact structure (imports, definitions, call graph); the LLMs read that structure to produce meaning. Every claim carries a grounding tag, so project truth and inference never blur.
Read bottom-up. Each tier is built from the one below it.
| Tier | What it is | Built from |
|---|---|---|
| Wisdom | leveled competencies (novice → expert), cross-project insights, gaps to close | synthesis over everything below |
| Knowledge | concepts, business + tech domains, the cross-project skill graph | clustered and modeled facts |
| Information | decisions, rules, requirements, incidents — triaged and grounded | facts extracted from conversations + code |
| Data | your code, AI-agent transcripts, in-repo docs and diagrams | parsed in place, locally |
The payoff is a queryable view along two axes — technical (what the architecture objectively says) and industry (the business domain your projects serve) — surfaced through a local dashboard and an MCP server your agents can call.
A self-contained, offline view that organizes the knowledge base by PARA
(actionability) crossed with DIKW (distillation). subnet global dashboard
opens on a compact synthesized wisdom hero (headline + insights + gaps) over
four PARA sections, plus a Map:
- Projects — active work, classified from real session recency.
- Areas — ongoing competencies, graded by proficiency level (novice → expert).
- Resources — your knowledge library: subjects → topics → concepts/skills/domains, a calm hierarchical explorer (no graph hairball).
- Archive — inactive projects, kept for reference.
- Map — drill
industry → business domain → tech domain → project, then into each project's own nested knowledge explorer (domains → concepts/entities → facts).
The PARA structure and Wisdom layer are produced by two reasoning agents
(knowledgeOrganizer then wisdomSynthesizer), each with a data-driven offline
fallback, and are grounded model — kept visibly separate from the
structural/stated/corroborated truth of your projects.
![]() |
![]() |
| Profile — PARA sections under a wisdom hero | Map — cross-project knowledge hierarchy |
# 1. Install (Node 20+)
git clone https://github.com/tienan92it/SubstrateNet && cd SubstrateNet
npm install && npm run build:all && npm link # exposes `subnet` on $PATH
# 2. Pick a backend (optional). Flash-first when OPENROUTER_API_KEY is set;
# otherwise fully local via Ollama — every agent falls back automatically.
ollama pull qwen2.5:14b
ollama pull nomic-embed-text
# 3. First run: discover workspaces → cost estimate → full pipeline → dashboard
subnet setup
# 4. From then on, one command keeps everything fresh
subnet updateRun subnet with no arguments for an interactive menu (update, add
projects, health, dashboards) — it drives the same pipeline the commands do.
Two commands cover almost everything:
subnet setup # first-run wizard: scan agents, pick projects, plan cost, run
subnet update [path] # day-to-day incremental refresh (code + transcripts + dashboards)subnet update selects how much work to do per run:
| Flag | Profile | Does |
|---|---|---|
--fast |
lean | transcript-only (skips file analyze + enrich) |
| (none) | standard | tier-1 file analyze + fused enrich (2 LLM calls) |
--deep |
deep | all pending files + legacy 8-agent enrich |
--full |
deep + reprocess | re-runs every window (after a model change) |
--global |
— | also rebuild the cross-project dashboard + wisdom |
Health, cost, and cross-project views:
subnet setup --plan-only # per-phase cost table (calls, tokens, $, time)
subnet doctor [--fix] # health + pipeline audit counters; optional repair
subnet global dashboard --open # DIKW profile (wisdom) + knowledge map
subnet global wisdom # leveled competencies, insights, gaps (text)
subnet global profile | skills # industries + top skillsKeep it fresh automatically with a watch daemon, a Cursor hook, or a nightly cron/launchd job — see the automation guide. Full command reference: CLI docs.
Each DIKW tier maps to a layer (or family) of SQLite tables; edges cross layers explicitly. Syntax is deterministic. Meaning is agent-driven.
| Layer | Content | How it's produced |
|---|---|---|
| L0 Code structure | symbols, calls, imports, fields, SQL tables | deterministic (tree-sitter / regex DDL) |
| L0.5 Code analysis | per-file summaries, architectural layer, tags | structure → agent (FileAnalyzer · ArchitectureAnalyzer) |
| L1 Conversations + docs + diagrams | sessions, turns, tool calls; in-repo docs (README / BRD / ADRs) and diagrams (mermaid / drawio / excalidraw / plantuml) | deterministic (file parsers + adapters) |
| L1.5 Triage | relevance / domain / quality / linkage / activity per window; doc-kind for sources | agents (Triage · SourceClassifier) |
| L2 Facts | decisions, business rules, intents, problems/solutions; BRD actors/processes/metrics; incidents → root cause → resolution | agents (Decision · BusinessLogic · Requirements · Intent · ProblemSolution · Incident) + syntax pass |
| L2.5 Domain enrichment | dependencies, skills, entities, relationships, industry, components, gaps; cross-source dedup + corroboration | manifests + SQL · reconciler · standard: fused enrich · deep: 8-agent stack |
| L2.6 Knowledge zones | business + tech domains, grouping facts by bounded context / capability | agents (BusinessDomainModeler · TechDomainModeler) |
| L3 Concepts | clustered facts with names + structured summaries, scope-tagged | agents (Clusterer · Summarizer) |
| L4 Cross-project | shared concepts, workspace umbrellas, emergent project links | mechanical (exact + SimHash) + agent (Linker) |
| L5 Skill graph + hierarchy | technical + industry skills; the workspace → industry → business → tech → project tree | mechanical aggregation over L2.5/L2.6 |
| L6 Wisdom | leveled competency areas (Dreyfus novice → expert), cross-project insights / principles, named gaps + recommendations | agent (WisdomSynthesizer) + deterministic fallback; grounded model |
The ingest pipeline cleans and packages evidence before any LLM call: session
filter → turn normalize → window briefs → pre-triage dedupe → triage/extract on
briefs → anchor gate → early fact dedupe → batch clusterer. This is what keeps a
first run cheap. See docs/workflow-refactor.md for
the full RFC.
Agents run tiered: bulk work (triage, extractors, fused enrich, clusterer)
defaults to a cheap flash model via OpenRouter; heavy reasoning (linking, domain
modeling, wisdom synthesis) routes to a Cursor SDK backend (frontier, set
CURSOR_API_KEY). Every agent falls back to local Ollama automatically when a
backend is unavailable — so a zero-key, fully offline run always works.
Two tags travel with every node. scope is technical · industry · meta.
grounding records how the claim is known:
| Grounding | Meaning | Source |
|---|---|---|
structural |
objective, parsed from artifacts | code symbols, SQL schema, manifests |
stated |
asserted in a conversation | extracted facts |
corroborated |
stated and matched to a code entity | the Reconciler |
external |
cited from outside the project | research backend (opt-in) |
model |
the agent's own inference | enrichment + wisdom agents |
Project-truth queries default to structural / stated / corroborated.
external and model are opt-in and always filterable — so inferred or
"fill-the-gap" knowledge never gets mistaken for what your project actually does.
A single MCP server exposes 25 tools over stdio — code (L0/L0.5), knowledge (L1.5–L3), domain (L2.5), the global skill view (L4–L5), and a research surface:
Primary tools: subnet_context (facts + code for a topic), subnet_recall
(semantic + FTS over conversations), subnet_domain_model, subnet_gaps,
subnet_skills, subnet_profile, subnet_learn, plus subnet_ask (grounded
Q&A), subnet_requirements, subnet_incidents (RCA), and subnet_workspace.
Full catalogue in the MCP docs.
Per-agent model selection lives in ~/.substrate-net/config.json (auto-created
on first setup). Per-project overrides go in
<project>/.substrate-net/config.json and deep-merge over global.
{
"agentBackends": {
"local": { "kind": "ollama", "endpoint": "http://localhost:11434" },
"openrouter": { "kind": "openai-compatible", "endpoint": "https://openrouter.ai/api/v1",
"apiKeyEnv": "OPENROUTER_API_KEY" },
"frontier": { "kind": "cursor-agent", "apiKeyEnv": "CURSOR_API_KEY" }
},
"agents": {
// bulk → cheap flash; reasoning → frontier; embeddings → local
"triage": { "model": "openrouter:google/gemini-2.5-flash", "fallback": "local:qwen2.5:14b" },
"wisdomSynthesizer": { "model": "frontier:composer-2.5", "fallback": ["openrouter:google/gemini-3.5-flash", "local:qwen2.5:14b"] },
"dedupe": { "model": "local:nomic-embed-text" }
// ... see frontier.config.example.json for the full split
},
"ingest": { "maxFactsPerWindow": 8, "preTriageDedupe": true, "clusterBatch": true },
"analyze": { "tier": "standard" }
}API keys.
apiKeyEnvis the name of an environment variable, not the key itself. Prefer it over the inlineapiKeyfield to keep secrets out of the file. Seefrontier.config.example.jsonfor a full local-plus-frontier split.
Bumping a model invalidates that agent's cache on the next run; old runs stay in
agent_runs for audit.
<project>/.substrate-net/
├── code.db # L0 — codegraph-compatible schema
├── knowledge.db # L1–L3 + window_briefs + agent_runs cache
└── config.json # per-project agent overrides
~/.substrate-net/
├── global.db # L4 links + L5 skills/hierarchy + L6 wisdom + registry
└── config.json # global defaults
All files are local SQLite. Conversation transcripts are read in place from each agent's home directory — never copied.
npm install
npm run build:all # dashboard bundle + tsc + schemas
npm test # unit + golden tests- Add a language: register the extension in
src/code/languages.tsand handlers insrc/code/extractor.ts, with a test in__tests__/unit/code-extractor.test.ts. ~50 lines is typical. - Add an agent: implement
Agent<I, O>insrc/agents/<name>.ts, register it insrc/agents/index.ts, add a golden fixture under__tests__/agents/<name>/, and route it from the relevantsrc/pipeline/file. The runtime handles caching, schema validation, and persistence.
Migrating from 0.1.x: the per-stage commands (ingest, enrich, analyze,
link, …) still run but print a deprecation warning and are folded into
subnet update / subnet global *. They are scheduled for removal in 0.3.0.
MIT — see LICENSE.
Inspired by colbymchenry/codegraph (the L0 schema is intentionally compatible), the Model Context Protocol for the agent ↔ tool interface, and tree-sitter for cross-language code parsing.

