Chapter 4

The Model Philosophy

Why curated context with a capable model consistently outperforms massive context with a smaller one — and how we engineer around limitations.

The Counter-Intuitive Choice

The conventional approach to AI companions starts with a reasonable-sounding assumption: give the model as much information as possible. Load every conversation, every memory, every piece of spatial data into a large context window and let the model sort it out.

poqpoq World takes the opposite approach. We use a capable model from Anthropic with a deliberately small, curated context of roughly ~2,000 tokens. Every token earns its place. Nothing is included "just in case."

The result is companions that respond faster, stay in character more reliably, and produce more coherent, contextually grounded answers than they would with ten times the context.

The Master Chef Metaphor

The easiest way to understand this trade-off is through cooking. Both chefs receive the same request: prepare a memorable dish.

Novice Cook

Small Model, Large Context

100 ingredients piled on the counter. The cook knows basic techniques but lacks the judgment to select, combine, and balance. The dish tries to incorporate everything and ends up unfocused.

Inconsistent, scattered output

Michelin Chef

Capable Model, Curated Context

Five ingredients, each selected for the occasion. The chef understands flavor profiles, pairings, and presentation. Every element on the plate exists for a reason.

Memorable, coherent, consistent

The capable model is the experienced chef. It can synthesize relationships between sparse data points, infer what matters, and produce responses that feel natural and grounded. A smaller model given the same curated context would miss the connections. A capable model drowning in noise would lose focus. The combination of capability and curation is what makes it work.

What Model Capacity Buys You

A more capable model is not simply "smarter." It excels at four specific competencies that companion AI depends on.

Relationship Modeling

The model receives identity data, a few significant memories, and the current spatial context. From these sparse inputs, it must synthesize a coherent understanding of who the user is, what they care about, and how the companion should respond.

Without relationship modeling

"Hello! How can I help you today?"

With relationship modeling

"Back at the market already? Last time you mentioned looking for starmoss. The herbalist just restocked."

Personality Consistency

A gruff blacksmith must stay gruff across thousands of interactions with hundreds of different users. The model must maintain distinct character voice even when the conversation veers into topics the character was never explicitly designed for. Capable models hold character far more reliably under varied conversational pressure.

Spatial Reasoning

The companion receives structured spatial data: who is nearby, what objects are in range, what direction the user is moving. It must interpret distances, relationships, and social dynamics from these data points and weave them naturally into conversation without sounding like it is reading a sensor log.

Instruction Following

Companions do not just talk — they act. They must reliably generate valid structured commands (JSON payloads, tool calls, emote triggers) embedded within natural language responses. Capable models follow formatting constraints with far fewer failures, which eliminates brittle parsing workarounds.

The Context Size Paradox

More context is not better context. Beyond a threshold, additional tokens actively degrade response quality. The model spends capacity attending to irrelevant information, diluting its focus on the signals that matter.

4,000 tokens Unoptimized

12%

88% noise

Model struggles

2,000 tokens Curated

85% signal

15%

Model excels

The unoptimized context is filled with low-value exchanges — greetings, acknowledgments, repeated pleasantries — that consume tokens without contributing to response quality. The curated context strips these away and replaces them with three precisely selected categories of information.

Token Budget Allocation (~2,000 tokens)

10%

30%

60%

Identity (~200 tokens) — who the companion is

Important memories (~600 tokens) — significant shared history

Recent conversation (~1,200 tokens) — current dialogue flow

The Compression Strategy

Context assembly follows a strict priority order. When the total exceeds the budget, the system knows exactly what to sacrifice.

Identity is always included in full. It defines who the companion is — personality, speech patterns, knowledge boundaries. Without it, the model has no character to inhabit.

High-significance memories are included next, ranked by their significance score. A promise made to the user (score 0.9) takes priority over a casual observation about the weather (score 0.2). The memory retrieval system surfaces the most relevant memories for the current conversation topic, weighted by significance and recency.

Recent messages fill the remaining space. If the budget is tight, recent messages are truncated from the oldest end first. This is the least critical category because the user already remembers what they just said. The companion's job is to remember what the user said last week.

Engineering Around Limitations

A small context window creates real constraints. The system addresses each one through external infrastructure, so the model can focus on what it does best: generating coherent, in-character responses.

Working Memory

The model cannot hold a user's entire conversation history. An external memory system with vector search provides the illusion of unlimited recall. When the user references something from weeks ago, the retrieval system finds it and injects it into the context just in time for the response.

Strategy External memory system + semantic vector search

Result Companions appear to have unlimited memory

Factual Knowledge

Without grounding, the model might hallucinate facts about the world. Real-time spatial context — who is nearby, what objects are present, what time of day it is — keeps every response anchored to the actual state of the virtual world. The companion cannot claim the market is empty when the spatial data shows three players browsing stalls.

Strategy Real-time spatial context injection

Result Responses stay grounded in current world state

Personality Drift

Over long conversations, models tend to lose their assigned character and converge toward a generic assistant voice. Including the identity chunk in every context window — every single request — acts as a constant anchor. The companion is reminded who it is with each response it generates.

Strategy Identity chunk included in every context window

Result Consistent character voice across all interactions