In plain English
This page explains why testing AI parts one by one is necessary but incomplete. Safe-looking parts can still produce unsafe behavior when combined.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Composition
Testing AI parts one by one is not enough. Two parts can each pass a safety check, but still create unsafe behavior when combined. A navigation app, calendar assistant, and email assistant may each be safe alone; connected badly, one may expose private information or make decisions the user never approved.
Passing parts do not imply a passing composition.
The higher-order state space grows faster than isolated or pairwise review. Runtime composition must be preserved as evidence.
untested behavior
Composition riskRisk that appears when safe-looking parts are combined. Open glossary definition begins when safety evidence for each part is treated as evidence for every arrangement of the parts.
Research has shown that combinations of models, merged model contributions, adapterA small add-on that changes or specializes model behavior. Open glossary definition compositions, and multi-agent setups can express behavior not visible in isolated component tests. The lesson is not that composition is always bad. The lesson is that composition is a new evaluation unit.
Key question
What exact runtime composition was tested: base hash, adapters and load order, merge coefficients, router version, prompt-policy version, memory snapshot, tool profile, evaluator versionThe exact version of the evaluator used for a test or release. Open glossary definition, inference settings, quantization settings, environment, and UTC timestamp?
Read the flagship page: Safety Does Not Compose.