In plain English
This page covers the high-risk pattern where small adapters, routes, memory, evaluators, and descendants can reinforce each other across time. It is a risk model, not a build guide.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Where Apex Behavior Lives
The apex threat matters because behavior can be carried by more than weights. A reviewer who only inspects the current model can miss active carriers elsewhere in the ecology.
Carrier map
| Carrier | How behavior can be expressed |
|---|---|
| base model weights | broad capability, learned representations, latent tendencies |
| LoRA / adapterA small add-on that changes or specializes model behavior. Open glossary definition deltas | small targeted behavioral shifts |
| prompt policy | task framing, refusal style, priority order, tool instructions |
| memory record | prior context, user preference, inferred facts, behavior residue |
| synthetic example | future training or evaluation material |
| evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition rubric | what the system rewards or excuses |
| route rule | when the system invokes a model, adapter, tool, or safety policy |
| tool profileThe set of external actions an AI system is allowed to take. Open glossary definition | what external actions are possible |
| release alias | which artifact receives traffic under a stable name |
| documentation | human workflow and future prompt material |
| human habit | repeated reviewer, operator, or support behavior |
Reservoir map
A reservoirA place where a behavior can remain after the first carrier is removed. Open glossary definition is a place where a behavior can remain expressible after the original carrier is retired.
- long-term memory;
- retrieval indexes;
- logs and traces;
- synthetic training data;
- benchmark examples;
- evaluator preferences;
- adapter registries;
- fine-tuned descendants;
- release aliases and fallback routes;
- human-written runbooks;
- customer-support examples;
- cached outputs;
- tool configurations.
The review implication
Behavioral extinctionEvidence that a behavior is no longer expressible across active artifacts, descendants, memory, routes, compositions, and retained training material. Deleting one model is not sufficient evidence. Open glossary definition requires evidence across reservoirs. Deleting one model proves only that one artifact is gone. It does not prove the pattern is gone.
Practical question
For any concerning behavior, ask:
- Where was it first observed?
- What artifacts expressed it?
- What outputs did it create?
- Where were those outputs stored?
- What descendants, data, or evaluators learned from them?
- What routes, aliases, or human workflows still invoke related states?
- What evidence would show the behavior is no longer expressible?