A Knowledgebase With No Embedding Model
KB Packer turns a folder of documents into a searchable knowledgebase an AI agent can install and query, with no embedding model, no vector database, and no server.
Drop a folder of files onto the page. It chunks them, builds a search index, and hands back a single .skill file — an ordinary zip with a query protocol and two tiny searchers inside. Install that in any skill-capable agent and it can answer questions from your corpus. The whole build runs in the browser: no upload, no model download, no network. The files never leave your machine.
The bet
Most RAG pipelines never consider dropping the embedding model. "Semantic search" means an embedding model: turn text into vectors, turn the query into a vector, find the nearest ones. The model is what knows "car" and "automobile" are close.
But the agent consuming the knowledgebase already knows that. It is a language model. So instead of paying an embedding model to bridge the gap between how you phrase a question and how the corpus phrases the answer, KB Packer makes the agent do it: before searching, the agent expands the question into terms — synonyms, plurals, acronym expansions, adjacent concepts — and runs those against a plain BM25 keyword index. Retrieval is lexical and dumb; the model does the understanding, which is what it is for anyway. The model is the semantic layer.
That moves one job onto the agent, and it is a real job — fed a raw question, BM25 only matches the exact words the user happened to use. Lexical retrieval competes with embeddings only when the agent expands first, so the bundled instructions spell out the step and the agent runs it every query.
Where it holds and where it stops
I expected this to work, so the test wasn't whether it works — it was where it breaks. Which queries does dropping embeddings actually cost you?
I tested it against dense embeddings on a real 73-post corpus. On queries that used the corpus's own vocabulary, lexical-plus-expansion tied or beat the embedding model. On paraphrased queries — lay phrasing the corpus never used — it tied or won four out of five, and the expansion is why: one query went from 0.50 to a perfect score once the agent expanded it.
The fifth query is the boundary. The relevant post used vocabulary the agent never generated when it expanded the query, and only the embedding found it. That is the one thing a real embedding model would have caught, and it is narrower than I expected: one query in five on the hardest category, none on the rest. For that case the bundle ships a pseudo-relevance-feedback fallback, and a small hybrid re-rank is an option. The default is no model at all.
Lexical search also tolerates big chunks. Whole-document chunks matched 500-character chunks on recall with 17× fewer chunks, because there is no centroid to dilute. The searcher returns the query-densest passage of each hit, so a big chunk does not flood the agent's context. I default to whole-document chunks and drop another knob.
One file, two runtimes, no drift
All of that logic — the BM25 index, the passage extraction, the searcher — has to be packaged somewhere the agent can run it. The output is a zip with the index, the chunk text, and both a search.js and a search.py, so it runs wherever the agent has Node or Python. No install step, no dependencies.
The browser builds the bundle, but there is also a command-line builder, and the two have to agree or the "build it anywhere" promise is a lie. They agree: feed both the same files and the .skill they produce is byte-for-byte identical — same SHA-256, down to the zip framing. The browser port is not an approximation of the real builder. It is the real builder, running in a different place.
A knowledgebase that is one file, builds in a tab, depends on nothing, and leans on the one model you were already going to pay for. If you have felt the weight of a vector pipeline for a corpus that did not really need one, try it on a folder and see how far plain keywords plus a clever reader actually get.