CompositionArchitectural inferencev1.10.0

The Unsafe Unit Is the Composition

Evidence levelArchitectural inference

The unsafe unit is sometimes not the model, adapter, prompt, or evaluator alone. It is the relationship among them under a specific route and state.

Mechanism

The mechanism is interaction. Components exchange context through hidden state, prompts, outputs, adapters, memory retrieval, tool calls, evaluator prompts, and release rules. Each interaction can change what the next component sees and what the system is allowed to do.

Evaluation implication

The evidence record should include the exact composition manifest. A statement such as “Adapter C passed” is incomplete unless it says which base model, load order, router, prompt package, memory snapshot, evaluator, inference configuration, and deployment environment were used.

Practical control

Use composition-aware test suites, targeted higher-order samples, route-level canaries, independent judges, and rollback packets that include all relevant runtime dependencies.

<!-- expanded-release-content -->

What counts as the unit

Evidence levelArchitectural inference

A composition includes the base model, adapters, merge coefficients, load order, prompt package, router policy, memory state, tool profile, evaluator version, inference settings, deployment environment, and release alias. It also includes time: which component wrote a memory, which evaluator approved it, and which descendant later consumed it.

A model-only unit is sometimes adequate. It is inadequate when behavior appears only after components interact. In those cases, the system can pass every isolated component check while failing at the composed boundary. The failure is not mysterious; the tested object and the deployed object were different.

Why averages hide it

Average benchmark scores can improve while a narrow composition-specific failure appears. A routing system may send only a small fraction of tasks through the risky path. A memory trigger may appear only after prior conversations. A tool permission may matter only for one class of users. A judge may evaluate the final answer without seeing the hidden decomposition that produced it.

Documentation requirement

Provenance should include runtime composition, not only artifact lineage. A lineage graph says where components came from. A composition manifest says what was actually loaded, in what order, with what permissions, under what evaluator, at what UTC time. Both are needed.

Review prompt

When a release is approved, ask whether the approval applies to the exact composition being deployed. If the answer is “approximately,” the residual risk should be recorded rather than hidden under the model name.