Apex ThreatStrong architectural inferencev1.21.5

In plain English

This page covers the high-risk pattern where small adapters, routes, memory, evaluators, and descendants can reinforce each other across time. It is a risk model, not a build guide.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Where Apex Behavior Lives

Evidence levelStrong architectural inferenceTechnical label: Strong architectural inference

The apex threat matters because behavior can be carried by more than weights. A reviewer who only inspects the current model can miss active carriers elsewhere in the ecology.

Carrier map

CarrierHow behavior can be expressed
base model weightsbroad capability, learned representations, latent tendencies
LoRA / A small add-on that changes or specializes model behavior. Open glossary definition deltassmall targeted behavioral shifts
prompt policytask framing, refusal style, priority order, tool instructions
memory recordprior context, user preference, inferred facts, behavior residue
synthetic examplefuture training or evaluation material
A system that judges whether an AI output or candidate is acceptable. Open glossary definition rubricwhat the system rewards or excuses
route rulewhen the system invokes a model, adapter, tool, or safety policy
The set of external actions an AI system is allowed to take. Open glossary definitionwhat external actions are possible
release aliaswhich artifact receives traffic under a stable name
documentationhuman workflow and future prompt material
human habitrepeated reviewer, operator, or support behavior

Reservoir map

A A place where a behavior can remain after the first carrier is removed. Open glossary definition is a place where a behavior can remain expressible after the original carrier is retired.

The review implication

Evidence that a behavior is no longer expressible across active artifacts, descendants, memory, routes, compositions, and retained training material. Deleting one model is not sufficient evidence. Open glossary definition requires evidence across reservoirs. Deleting one model proves only that one artifact is gone. It does not prove the pattern is gone.

Practical question

For any concerning behavior, ask:

  1. Where was it first observed?
  2. What artifacts expressed it?
  3. What outputs did it create?
  4. Where were those outputs stored?
  5. What descendants, data, or evaluators learned from them?
  6. What routes, aliases, or human workflows still invoke related states?
  7. What evidence would show the behavior is no longer expressible?