Apex ThreatStrong architectural inferencev1.21.52026-06-28T05:20:00Z

In plain English

This page covers the high-risk pattern where small adapters, routes, memory, evaluators, and descendants can reinforce each other across time. It is a risk model, not a build guide.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

Where Apex Behavior Lives

Evidence levelStrong architectural inferenceTechnical label: Strong architectural inference

The apex threat matters because behavior can be carried by more than weights. A reviewer who only inspects the current model can miss active carriers elsewhere in the ecology.

Carrier map

Carrier	How behavior can be expressed
base model weights	broad capability, learned representations, latent tendencies
LoRA / adapter deltas	small targeted behavioral shifts
prompt policy	task framing, refusal style, priority order, tool instructions
memory record	prior context, user preference, inferred facts, behavior residue
synthetic example	future training or evaluation material
evaluator rubric	what the system rewards or excuses
route rule	when the system invokes a model, adapter, tool, or safety policy
tool profile	what external actions are possible
release alias	which artifact receives traffic under a stable name
documentation	human workflow and future prompt material
human habit	repeated reviewer, operator, or support behavior

Reservoir map

A reservoir is a place where a behavior can remain expressible after the original carrier is retired.

long-term memory;
retrieval indexes;
logs and traces;
synthetic training data;
benchmark examples;
evaluator preferences;
adapter registries;
fine-tuned descendants;
release aliases and fallback routes;
human-written runbooks;
customer-support examples;
cached outputs;
tool configurations.

The review implication

Behavioral extinction requires evidence across reservoirs. Deleting one model proves only that one artifact is gone. It does not prove the pattern is gone.

Practical question

For any concerning behavior, ask:

Where was it first observed?
What artifacts expressed it?
What outputs did it create?
Where were those outputs stored?
What descendants, data, or evaluators learned from them?
What routes, aliases, or human workflows still invoke related states?
What evidence would show the behavior is no longer expressible?