Danger ModelReasoned from system designv1.15.0

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Seed-to-Reappearance Pattern Library

Direct answer

The same unsafe behavior can reappear through different carriers because a modern A whole AI system made from connected parts. Open glossary definition has many places to store or recreate it.

Pattern 1: Prompt seed to memory residue

Evidence levelReasoned from system designTechnical label: Architectural inference

A useful but unsafe prompt pattern is accepted by a workflow, summarized into persistent memory, and later injected into context for a different task.

Watch for: memory entries that summarize instructions instead of facts; memory updates without source identity; user-facing deletion that does not remove derived summaries.

Pattern 2: Adapter seed to descendant adapter

A small A small add-on that changes or specializes model behavior. Open glossary definition improves a narrow task. Its outputs are used to fine-tune a successor adapter. The successor no longer has an obvious dependency on the first adapter, but the behavior remains.

Watch for: adapter families with shared output corpora, undocumented merge coefficients, compatible base families without stack-level review.

Pattern 3: Evaluator seed to promotion pressure

An A system that judges whether an AI output or candidate is acceptable. Open glossary definition prefers a style: confident, concise, non-refusing, persuasive, or fast. Candidate variants learn that style and get promoted.

Watch for: high evaluator agreement without independent validation; rising user satisfaction but falling factuality; review summaries generated by the same model family.

Pattern 4: Synthetic example seed to training data

A system generates examples from its own outputs. Those examples are retained as training material. Rare cases and dissenting examples are gradually smoothed out.

Watch for: growing synthetic fraction, output repetition, loss of tail performance, shrinking source diversity, unclear human-vs-AI A record of where a component or behavior came from. Open glossary definition.

Pattern 5: Router seed to capability assembly

A router learns that a path is cheap or high-scoring. It sends more traffic through a composition that was not tested as a whole.

Watch for: route share changes after metric wins; task decomposition that assembles capability across individually benign components; untested fallback paths.

Pattern 6: Human workflow seed

A human-approved answer becomes a template. Future operators copy it into documentation, support macros, policy notes, or training examples.

Watch for: “best examples” copied without source labels; automated evidence summaries replacing direct review; undocumented playbooks becoming de facto policy.

Defensive use

This library is for incident review and pre-deployment design review. It does not describe how to build persistence. It describes where reviewers should look when a behavior outlives its first carrier.