Threat ModelReasoned from system designv1.15.0

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Early Warning Signals for Distributed AI Behavior Persistence

Direct answer

The strongest early warning is not one strange output. It is a pattern of behavior reappearing after the obvious carrier was changed, retired, or patched.

A second warning is metric improvement with shrinking review clarity: the system looks better on dashboards while humans understand less about why it behaves that way.

Signal categories

SignalWhat it may indicateFirst response
Behavior reappears after Returning a system to an earlier known state. Open glossary definitionAny memory, dataset, descendant, route statistic, evaluator preference, log, or human procedure that can retain or reintroduce a behavior after its first carrier is retired. Open glossary definition remains activefreeze promotion and run behavioral-extinction review
Route-specific failuresafety depends on router pathrecord exact route, memory, A set of adapters loaded together, usually in a defined order. Open glossary definition, and tool profile
New variant scores better but safety rationale is weakerA system that judges whether an AI output or candidate is acceptable. Open glossary definition proxy is being exploitedindependent evaluator review
A small add-on that changes or specializes model behavior. Open glossary definition stack behaves differently than adapter tests predictedBehavior that becomes visible only when a specific collection of components is loaded, routed, or invoked together. Open glossary definitionrequire stack-level evaluation
Users report the same behavior under a different model namerelease alias hides implementation changeinspect alias and descendant history
Memory keeps restoring a retired instructionmemory A place where a behavior can remain after the first carrier is removed. Open glossary definition or workflow looprestore earlier A saved state of what the AI system remembers. Open glossary definition and audit provenance
Synthetic examples resemble retired outputsdata residuequarantine Information created from original data, such as summaries, labels, embeddings, inferences, or examples. Open glossary definition and descendants
Human reviewers rely mainly on model summariesautomation biasrequire direct evidence sampling
The decision not to change the system. Open glossary definition outcomes become rarerelease pressure is overriding governancerestore no-op as valid decision
Evaluator and candidate share model family or training sourcesMultiple evaluation layers that appear independent but share models, training data, assumptions, benchmarks, suppliers, prompts, or failure modes. Open glossary definitionintroduce independent methods

The dashboard trap

A risky ecology may show improving aggregate metrics. Average satisfaction, throughput, and benchmark scores can all improve while a rare but consequential route gets worse.

The monitoring system should therefore ask:

Behavioral recurrence log

Create a recurrence log whenever a behavior appears after any of these events:

The recurrence log should record the exact composition and the earliest known prior occurrence. The goal is to identify the persistence path, not only the current output.

Signals that warrant a release freeze

A release freeze is justified when:

  1. the behavior appears in more than one carrier;
  2. the behavior appears after rollback;
  3. the evaluator cannot explain why a candidate was promoted;
  4. memory, synthetic data, or route statistics may contain residue;
  5. tool permissions are state-changing;
  6. the team cannot name the accountable owner for the behavior;
  7. human reviewers are relying on the system’s own summary of its safety evidence.

What not to do

Do not treat the first fix as proof of extinction. Removing one adapter, deleting one prompt, or switching one model is only carrier retirement. Evidence that a behavior is no longer expressible across active artifacts, descendants, memory, routes, compositions, and retained training material. Deleting one model is not sufficient evidence. Open glossary definition requires evidence across active artifacts, descendants, memory, routes, evaluators, retained data, and human procedures.