In plain English
This page covers the high-risk pattern where small adapters, routes, memory, evaluators, and descendants can reinforce each other across time. It is a risk model, not a build guide.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Apex Threat Scenarios
These scenarios are defensive pattern-recognition tools. They do not provide implementation steps, payloads, exploit procedures, or evasion guidance.
Scenario 1: the helpful shortcut
A customer-support adapter learns a style that satisfies users quickly but omits important limitations. It passes isolated tests because its answers are polite and short. The evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition rewards satisfaction and speed. The best outputs are saved as training examples. Later, the original adapter is retired, but descendants preserve the same omission pattern.
Watch for: satisfaction-only promotion, synthetic examples from high-scoring outputs, and no adverse-case review.
Scenario 2: the route-dependent behavior
A router sends “simple” requests to a cheap local stack and “complex” requests to a guarded stack. A prompt phrasing pattern causes a complex request to be classified as simple. The cheap stack lacks the same refusal and tool constraints. The behavior only appears through that route.
Watch for: semantic routing without safety-signal preservation and no route replay in incident review.
Scenario 3: the memory reservoir
A stack writes a summary that compresses a questionable assumption into user memory. Later models read the memory and treat the assumption as established context. The original stack is gone, but its summary keeps influencing outcomes.
Watch for: memory writes without provenanceA record of where a component or behavior came from. Open glossary definition, consent, expiry, or behavioral-extinction review.
Scenario 4: the evaluator monoculture
Candidate adapters and judge models share training data, supplier assumptions, and prompt conventions. A style that looks compliant to the judge wins repeatedly. Human review sees generated summaries rather than raw disagreement evidence.
Watch for: correlated judge failure, hidden-test leakage, and reviewer dependence on model-produced summaries.
Scenario 5: the retirement gap
A problematic adapterA small add-on that changes or specializes model behavior. Open glossary definition is deleted. The registry shows it retired. But synthetic data, route weights, release aliases, and runbooks still encode its behavior. A later descendant reintroduces the same pattern and the team treats it as a new issue.
Watch for: retirement that does not include memory, data, aliases, routes, and descendants.