Why curated context with a capable model consistently outperforms massive context with a smaller one — and how we engineer around limitations.
The conventional approach to AI companions starts with a reasonable-sounding assumption: give the model as much information as possible. Load every conversation, every memory, every piece of spatial data into a large context window and let the model sort it out.
poqpoq World takes the opposite approach. We use a capable model from
Anthropic
with a deliberately small, curated context of roughly ~2,000
tokens. Every token earns its place. Nothing is included "just in case."
The result is companions that respond faster, stay in character more reliably, and produce more coherent, contextually grounded answers than they would with ten times the context.
The easiest way to understand this trade-off is through cooking. Both chefs receive the same request: prepare a memorable dish.
100 ingredients piled on the counter. The cook knows basic techniques but lacks the judgment to select, combine, and balance. The dish tries to incorporate everything and ends up unfocused.
Five ingredients, each selected for the occasion. The chef understands flavor profiles, pairings, and presentation. Every element on the plate exists for a reason.
The capable model is the experienced chef. It can synthesize relationships between sparse data points, infer what matters, and produce responses that feel natural and grounded. A smaller model given the same curated context would miss the connections. A capable model drowning in noise would lose focus. The combination of capability and curation is what makes it work.
A more capable model is not simply "smarter." It excels at four specific competencies that companion AI depends on.
The model receives identity data, a few significant memories, and the current spatial context. From these sparse inputs, it must synthesize a coherent understanding of who the user is, what they care about, and how the companion should respond.
A gruff blacksmith must stay gruff across thousands of interactions with hundreds of different users. The model must maintain distinct character voice even when the conversation veers into topics the character was never explicitly designed for. Capable models hold character far more reliably under varied conversational pressure.
The companion receives structured spatial data: who is nearby, what objects are in range, what direction the user is moving. It must interpret distances, relationships, and social dynamics from these data points and weave them naturally into conversation without sounding like it is reading a sensor log.
Companions do not just talk — they act. They must reliably generate valid structured commands (JSON payloads, tool calls, emote triggers) embedded within natural language responses. Capable models follow formatting constraints with far fewer failures, which eliminates brittle parsing workarounds.
More context is not better context. Beyond a threshold, additional tokens actively degrade response quality. The model spends capacity attending to irrelevant information, diluting its focus on the signals that matter.
The unoptimized context is filled with low-value exchanges — greetings, acknowledgments, repeated pleasantries — that consume tokens without contributing to response quality. The curated context strips these away and replaces them with three precisely selected categories of information.
Context assembly follows a strict priority order. When the total exceeds the budget, the system knows exactly what to sacrifice.
Identity is always included in full. It defines who the companion is — personality, speech patterns, knowledge boundaries. Without it, the model has no character to inhabit.
High-significance memories are included next, ranked by their significance score. A promise made to the user (score 0.9) takes priority over a casual observation about the weather (score 0.2). The memory retrieval system surfaces the most relevant memories for the current conversation topic, weighted by significance and recency.
Recent messages fill the remaining space. If the budget is tight, recent messages are truncated from the oldest end first. This is the least critical category because the user already remembers what they just said. The companion's job is to remember what the user said last week.
A small context window creates real constraints. The system addresses each one through external infrastructure, so the model can focus on what it does best: generating coherent, in-character responses.
The model cannot hold a user's entire conversation history. An external memory system with vector search provides the illusion of unlimited recall. When the user references something from weeks ago, the retrieval system finds it and injects it into the context just in time for the response.
Without grounding, the model might hallucinate facts about the world. Real-time spatial context — who is nearby, what objects are present, what time of day it is — keeps every response anchored to the actual state of the virtual world. The companion cannot claim the market is empty when the spatial data shows three players browsing stalls.
Over long conversations, models tend to lose their assigned character and converge toward a generic assistant voice. Including the identity chunk in every context window — every single request — acts as a constant anchor. The companion is reminded who it is with each response it generates.