LLM Knowledge Bases
You rarely ever write or edit the wiki manually, it's the domain of the LLM.
Describes a personal research workflow where raw source documents are compiled by an LLM into a markdown wiki, maintained through index files, health checks, generated outputs, and lightweight tools rather than a heavyweight RAG stack.
Classification
- Role
- practitioner-note
- Domain
- research
- Source type
- tweet
- Harness types
- grounding-context-loadingexecution-harnessvalidation-harnessrepair-harnessmonitoring-harnesslearning-harnessinterface-harness
- Validation position
- before-generationimmediately-after-generationcontinuous
- Validation mode
- empiricalinterpretive
- Prescription stance
- mixed
- Relation to argument
- capability-is-extendedfirst-mile-input-formationvalidation-is-constitutiverepairability-mattersobservability-mattersdomain-structure-matters
- Tags
- knowledge-basemarkdown-wikiobsidianagentic-researchllm-maintained-artifactspersonal-knowledge-management
Extended capability commentary
- Input legibility
- The raw/ to wiki compilation process is explicitly about making heterogeneous documents legible to future LLM turns.
- Task structure
- Markdown files, indexes, backlinks, summaries, and Obsidian views give the work a manipulable structure.
- Reward richness
- The workflow has useful signals from links, consistency, and answer quality, but not an explicit reward signal.
- Feedback latency
- Feedback arrives through Q&A, rendered outputs, and health checks, but not usually as immediate pass/fail tests.
- Repairability
- Health checks, missing-data imputation, and filing outputs back into the wiki make the knowledge base incrementally repairable.
- Observability
- The wiki is human-readable markdown and images viewed in Obsidian, so the agent's knowledge substrate stays inspectable.
- Reversibility
- Markdown artifacts are versionable, though the post does not foreground git or rollback.
- Offline evaluability
- Some checks can be run offline over the wiki, but factual gaps still require web search or source refresh.
- Institutional ratification
- This is a personal research workflow rather than an organizational ratification system.
Why it matters
This is the Extended Frontier applied to knowledge work: the model's capability comes from a maintained corpus, indexes, summaries, visual outputs, and health checks that make research cumulative instead of ephemeral.
Annotation
Karpathy describes a knowledge-work harness, not just a note-taking habit. Raw sources go into one directory; an LLM incrementally compiles them into a markdown wiki with summaries, backlinks, concept pages, index files, and derived visualizations. Obsidian becomes the human-facing IDE, while the LLM owns most direct edits to the wiki.
The important move is that research outputs are not terminal chat answers. They become files: markdown notes, Marp slides, matplotlib images, search indexes, and follow-up articles that can be filed back into the corpus. Each query can make the next query easier because the knowledge base itself accumulates structure.
For the library, this is a clean example of capability as artifact maintenance. Karpathy expected to need "fancy RAG," but at roughly 100 articles and 400K words, LLM-maintained summaries and index files were enough. The boundary condition matters: the system works because the scale is still small enough for source-aware traversal and because the artifacts are legible.
Extended Frontier Read
The raw model is not the unit of analysis. The useful system is model plus:
- a raw source archive,
- a compiled markdown wiki,
- index and summary files,
- Obsidian as inspection surface,
- generated outputs that feed back into the wiki,
- health checks over consistency and missing data,
- small custom tools such as a wiki search engine.
This belongs beside harness entries, but it broadens the frame from coding agents to research agents. The same pattern appears: make the environment legible, let the model act on files, inspect the result, repair the substrate, and let work accumulate.
Open Questions
- At what corpus size does this stop working without stronger retrieval infrastructure?
- Which health checks are most predictive of useful future Q&A?
- Does finetuning on the wiki improve capability, or does it destroy the inspectability and repairability that make the workflow valuable?
Notes
Source text supplied by Daniel from X. This entry was prepared with Codex (OpenAI).
Related entries
- What Is an Agent HarnessAparna Dhinakaran · 2026-04-21capability-is-extendedvalidation-is-constitutiverepairability-mattersobservability-mattersgrounding-context-loadingexecution-harnessvalidation-harnessrepair-harnessmonitoring-harnesslearning-harnessinterface-harness
- An open-source spec for Codex orchestration: SymphonyAlex Kotliarskyi, Victor Zhu, and Zach Brock · 2026-04-26capability-is-extendedvalidation-is-constitutiverepairability-mattersobservability-mattersexecution-harnessvalidation-harnessrepair-harnessmonitoring-harnesslearning-harnessinterface-harness
- Hermes Agent READMENous Research · 2026-04-28capability-is-extendedfirst-mile-input-formationrepairability-mattersobservability-mattersgrounding-context-loadingexecution-harnessrepair-harnessmonitoring-harnesslearning-harnessinterface-harness
- Good and Bad Harness EngineeringDaniel Miessler · 2025-08-31capability-is-extendedrepairability-mattersobservability-mattersgrounding-context-loadingexecution-harnessvalidation-harnessrepair-harnessmonitoring-harness
Overlap is computed on tags, relation-to-argument, and harness types — not on role or domain, because contrasts are often the most useful neighbours.