How to give AI companions verifiable, updatable world knowledge without retraining the model -- and why retrieval-augmented generation is the right trade-off for dynamic virtual environments.
AI companions need to answer factual questions about the world they inhabit. "What is this platform?" "How much does it cost?" "How do I get started?" Without a knowledge foundation, companions either hallucinate answers or deflect with "I don't know."
The solution is Retrieval-Augmented Generation (RAG): a pipeline that retrieves relevant facts from a knowledge base and injects them into the language model's prompt. Companions answer from verified sources, not imagination. Knowledge updates happen through editing a file, not retraining a model.
When you need an AI system to "know" something, you have three main architectural options. Each involves a different trade-off between cost, speed, accuracy, and maintainability.
Retrain the language model on your specific data, encoding facts directly into billions of parameters. Inference is fast because there are no external lookups, but every knowledge update requires expensive retraining. You cannot easily verify what the model "learned," and it may confidently produce wrong answers. For a platform FAQ that changes monthly, fine-tuning is overkill.
Store knowledge entries as embedding vectors and search for semantically relevant results, but do not inject them into the prompt. The AI still generates answers from its own training data. Search is fast and storage is cheap, but the model may ignore retrieved results entirely and hallucinate its own answer. There is no way to trace responses back to authoritative sources.
Retrieve relevant knowledge via semantic search, augment the prompt with the retrieved facts, then generate an answer grounded in that context. Every answer traces back to its source. Updates require editing a knowledge file and re-seeding, not retraining. The cost is minimal: embedding generation is fractions of a cent per query, and vector search completes in single-digit milliseconds.
RAG occupies the sweet spot for dynamic, verifiable, updatable knowledge at scale. Five factors drove the decision.
The RAG pipeline has five stages. Knowledge starts as a structured file, passes through an embedding pipeline, lands in a vector database, and surfaces in the language model's prompt at query time.
Knowledge is stored in structured files that are human-readable, version-controlled, and editable without touching code. Each entry contains a question, an answer, keywords for hybrid search, and a significance score.
Why structured files? Non-technical team members can update knowledge without deploying code. Version control tracks every change with clean diffs. Comments explain why entries exist. The format is structured enough for automation yet readable enough for humans.
Each knowledge entry carries several fields that serve different parts of the pipeline.
0.0 to
1.0 score controlling retrieval priority. Platform
identity questions score 0.90+; minor details score
lower.global entries are visible to all;
companion-specific entries are filtered by identity.The retrieval query combines two strategies in a single database operation. Vector similarity (70% weight) finds semantically related entries through approximate nearest-neighbor search. Keyword matching (30% weight) ensures exact terms are never missed. The combined score determines ranking.
A critical early bug: when searching for a specific companion
(say, "Apollo"), the query filter used
companion_id = 'apollo', which excluded all
global knowledge entries. Apollo could not find
platform FAQs even though they existed in the database.
The fix was an OR condition:
(companion_id = 'apollo' OR companion_id = 'global').
This ensures every companion sees both their own scoped knowledge
and the shared global knowledge base.
The search query filters on three dimensions simultaneously.
0.3) to keep results relevant.Results are ordered by the weighted combination of vector similarity and keyword rank, limited to the top 10 entries.
The complete RAG overhead -- embedding the query, searching the vector index, and returning results -- adds roughly 18 milliseconds to a pipeline where model inference alone takes 500ms or more. RAG is effectively free in terms of user-perceived latency.
An IVFFlat (Inverted File with Flat quantization) index partitions the vector space into clusters. At query time, only the nearest clusters are searched rather than every vector in the table. For a dataset of hundreds to tens of thousands of entries, this delivers consistent sub-5ms search times.
A practical decision every RAG system faces: what dimensionality for embeddings? Higher dimensions capture more semantic nuance but cost more in compute, memory, and latency.
| Factor | Lower Dimension (384D) | Higher Dimension (768D) |
|---|---|---|
| Generation speed | ~12ms per embedding | ~50ms cached, much slower cold |
| Memory per vector | 1.5 KB | 3.0 KB |
| Semantic quality | Excellent for FAQ matching | Marginally better (~2%) |
| At 100K entries | ~150 MB | ~300 MB |
| Best for | Real-time queries, production | Research, batch processing |
For real-time virtual world companions where every query must complete in under 20 milliseconds, the lower-dimensional model is the right choice. The marginal quality improvement of higher dimensions does not justify the latency and memory cost. Save 768D for offline batch analysis or research contexts where speed is not a constraint.
The knowledge base coexists with user conversation memories in the same database table and search query. Two mechanisms prevent them from interfering with each other.
Knowledge base entries are stored under a reserved system identifier, separate from any real user. The search query includes both the current user's ID and the system ID, returning personal memories and factual knowledge in a single pass. No second query needed.
Knowledge entries are tagged with a core tier
that the memory management system respects absolutely. While
old user memories may be compressed or archived over time,
core knowledge is never deleted,
never compressed, and never expires. Platform FAQs
remain available regardless of how much time passes.
Without proper significance scoring, a casual conversational memory ("we chatted about the weather") might outrank a critical FAQ entry ("what is this platform"). The scoring system ensures that important knowledge always surfaces first.
0.5.+0.3 because they define the companion's
core knowledge.+0.05.+0.1 as they carry more informational value.
The result: platform knowledge entries land in the
0.75 to 0.95 range, ensuring they rank
above casual conversation memories when both are relevant.
RAG answers questions. The next architectural layer -- tool teaching -- takes action. Understanding the boundary between them is critical for system design.
| Aspect | RAG (Knowledge) | Tool Teaching (Agency) |
|---|---|---|
| Purpose | Answer questions | Manipulate the world |
| Data flow | Database to prompt | Prompt to API call |
| Side effects | None (read-only) | World state changes |
| Failure mode | "I don't know" | "Operation failed" |
| Security model | Read-only, inherently safe | Requires permission system |
| Complexity | Low (search + inject) | High (NLP + execution + sync) |
RAG is the foundation. Once companions can reliably retrieve and communicate knowledge, the same retrieval pipeline can teach them tool syntax -- bridging the gap from passive knowledge to active world manipulation.
The RAG architecture is designed to grow in several directions without requiring fundamental changes.
Different companions know different things. A wisdom-oriented companion has philosophical knowledge; a trickster has different lore. The search query already filters by companion scope, so adding personality-specific knowledge is simply a matter of creating scoped entries alongside global ones.
Track when knowledge changes and allow rollback. Each entry carries a version identifier. When pricing changes from one tier to another, the old version is retained in history while the new version becomes active.
When users ask questions that have no existing match, the system can flag these gaps. A human reviewer approves an answer, it enters the knowledge base, and future users get instant answers to the same question. The knowledge base learns from real usage patterns.
Knowledge entries can carry language tags. The search query filters by user language with a fallback to the default language, ensuring coverage even when translations are incomplete.
When building search queries with dynamic filter conditions, an AND clause added later can silently negate an earlier inclusion. The fix: always use OR conditions for "include additional scope" logic. This pattern applies broadly to any system where multiple scopes need to coexist in a single query.
Writing a knowledge file does not make it searchable. The seeding pipeline must run to generate embeddings and insert entries into the database. Establishing a clear deployment checklist -- write, seed, verify -- prevents the gap between authoring and availability.
Testing only global-scope queries can mask filtering bugs that appear when searching as a specific companion. The test matrix should cover global scope, each companion scope, and user memory retrieval independently.
Without explicit significance, the most recent conversational memory may outrank a critical FAQ. Assigning importance scores to knowledge entries ensures that foundational facts always surface when relevant, regardless of how many casual memories accumulate.
RAG is the sweet spot for dynamic, verifiable, updatable AI knowledge at scale. Fine-tuning is expensive and opaque. Raw embeddings lack grounding. RAG combines the strengths of both: semantic search finds relevant knowledge, and prompt augmentation ensures the model answers from verified sources rather than imagination.
The overhead is negligible -- roughly 18 milliseconds added to a pipeline measured in hundreds of milliseconds. The knowledge base is a structured file that anyone can edit. And the same retrieval infrastructure that delivers factual knowledge today can teach companions tool syntax tomorrow.