Danger ModelReasoned from system designv1.15.0
In plain English
This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Implementation Checklist for Transition-Graph Safety
Direct answer
Review the transition graphThe map of how an AI system is allowed to change over time. Open glossary definition, not just the model. The checklist below turns the danger model into practical review work.
Architecture review
- Identify all carriers: models, adapters, prompts, memory, datasets, evaluators, routes, tool profiles, release aliases, and human workflows.
- Identify all transitions: fine-tune, merge, distill, quantize, prune, route, replace, promote, retire, restore, consolidate memory, change evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition, change permissions.
- Record which transitions are automatic, human-approved, or prohibited.
- Define which transitions require no-opThe decision not to change the system. Open glossary definition as a valid outcome.
Composition review
- Create a composition manifestA machine-readable record of the exact runtime composition used for an evaluation, release, incident, or rollback. Open glossary definition for every runtime state.
- Include base hash, adapters, load order, router, prompt policy, memory snapshotA saved state of what the AI system remembers. Open glossary definition, tool profile, evaluator, inference config, quantization, environment, and UTC timestamp.
- Test high-risk compositions, not only individual components.
- Record untested compositions explicitly.
Selection review
- List every metric that can preserve or promote behavior.
- Identify what each metric fails to measure.
- Add independent measures for source fidelity, rare cases, fairness, traceability, consent, and rollbackReturning a system to an earlier known state. Open glossary definition readiness.
- Make no-op a permitted result.
Feedback and memory review
- Label synthetic, human, and mixed data.
- Quarantine outputs from incidents.
- Review memory writes as state changes, not harmless notes.
- Preserve deletion, correction, and consent controls.
Action-layer review
- Separate read tools from write tools.
- Require conduct firewalls for consequential actions.
- Require explicit approval for irreversible operations.
- Use least privilege and per-route tool profiles.
Observability review
- Require replayable traces for high-risk flows.
- Record model, adapterA small add-on that changes or specializes model behavior. Open glossary definition, route, memory, evaluator, tool, and permission versions.
- Measure trace coverage and fidelity.
- Redact sensitive data without making replay impossible.
Retirement review
- Define retirement triggers before deployment.
- Retire stale, redundant, drifting, boundary-violating, or provenanceA record of where a component or behavior came from. Open glossary definition-broken variants.
- Verify behavioral extinctionEvidence that a behavior is no longer expressible across active artifacts, descendants, memory, routes, compositions, and retained training material. Deleting one model is not sufficient evidence. Open glossary definition across active carriers and reservoirs.
- Archive evidence and revoke permissions.
Incident review
- Ask where the behavior first entered.
- Ask which composition expressed it.
- Ask what rewarded it.
- Ask where residue was stored.
- Ask which descendants or aliases inherited it.
- Ask what rollback missed.
- Assign an accountable behavior owner.