Threat ModelReasoned from system designv1.15.02026-06-27T23:20:00Z

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

Early Warning Signals for Distributed AI Behavior Persistence

Direct answer

The strongest early warning is not one strange output. It is a pattern of behavior reappearing after the obvious carrier was changed, retired, or patched.

A second warning is metric improvement with shrinking review clarity: the system looks better on dashboards while humans understand less about why it behaves that way.

Signal categories

Signal	What it may indicate	First response
Behavior reappears after rollback	persistence reservoir remains active	freeze promotion and run behavioral-extinction review
Route-specific failure	safety depends on router path	record exact route, memory, adapter stack, and tool profile
New variant scores better but safety rationale is weaker	evaluator proxy is being exploited	independent evaluator review
Adapter stack behaves differently than adapter tests predicted	composition-triggered behavior	require stack-level evaluation
Users report the same behavior under a different model name	release alias hides implementation change	inspect alias and descendant history
Memory keeps restoring a retired instruction	memory reservoir or workflow loop	restore earlier memory snapshot and audit provenance
Synthetic examples resemble retired outputs	data residue	quarantine derived data and descendants
Human reviewers rely mainly on model summaries	automation bias	require direct evidence sampling
No-op outcomes become rare	release pressure is overriding governance	restore no-op as valid decision
Evaluator and candidate share model family or training sources	evaluator monoculture	introduce independent methods

The dashboard trap

A risky ecology may show improving aggregate metrics. Average satisfaction, throughput, and benchmark scores can all improve while a rare but consequential route gets worse.

The monitoring system should therefore ask:

Which routes changed?
Which adapter stacks changed?
Which memory snapshots changed?
Which evaluator version changed?
Which tool permissions changed?
Which outputs entered training or retrieval material?
Which humans approved based on summaries?

Behavioral recurrence log

Create a recurrence log whenever a behavior appears after any of these events:

model replacement;
adapter removal;
prompt-policy change;
memory cleanup;
evaluator change;
route change;
rollback;
quantization or pruning;
merged-adapter promotion;
synthetic data refresh.

The recurrence log should record the exact composition and the earliest known prior occurrence. The goal is to identify the persistence path, not only the current output.

Signals that warrant a release freeze

A release freeze is justified when:

the behavior appears in more than one carrier;
the behavior appears after rollback;
the evaluator cannot explain why a candidate was promoted;
memory, synthetic data, or route statistics may contain residue;
tool permissions are state-changing;
the team cannot name the accountable owner for the behavior;
human reviewers are relying on the system’s own summary of its safety evidence.

What not to do

Do not treat the first fix as proof of extinction. Removing one adapter, deleting one prompt, or switching one model is only carrier retirement. Behavioral extinction requires evidence across active artifacts, descendants, memory, routes, evaluators, retained data, and human procedures.