Danger ModelReasoned from system designv1.15.0

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Evidence and Counterevidence for the Danger Model

Direct answer

The danger model is strongest when a behavior can be traced across carriers, reservoirs, routes, descendants, and promotion decisions. It is weaker when the system has bounded composition, complete provenance, independent evaluation, no persistent reservoirs, and rehearsed Restoring not only a model artifact but the relevant router, prompts, memory state, tool permissions, evaluator version, deployment alias, and data dependencies. Open glossary definition.

What would support the model

Evidence levelReasoned from system designTechnical label: Architectural inference

Supporting evidence includes:

What would weaken the model

The model is less applicable when:

What would change the assessment

Strong evidence of effective composition-aware testing, independent evaluator diversity, trustworthy A record of where a component or behavior came from. Open glossary definition, complete trace replay, and behavioral-extinction controls would reduce concern. Evidence that a behavior persists through descendants, memory, synthetic examples, or aliases after retirement would increase concern.

Counterargument

Modularity is not inherently dangerous. It can improve cost, privacy, specialization, local deployment, resilience, and replaceability. Cognivirus.com argues that modularity shifts the Confidence, backed by evidence, that a system meets safety or governance requirements. Open glossary definition problem, not that modular AI should be banned.