Danger ModelReasoned from system designv1.15.0
In plain English
This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Concrete Warning Signals for Cognivirus-Like Risk
Direct answer
The most useful warning signals are not dramatic. They are small signs that the deployed state no longer matches the reviewed state.
Composition signals
- behavior appears only for one route, user segment, load order, memory state, or tool profileThe set of external actions an AI system is allowed to take. Open glossary definition;
- isolated component tests pass but end-to-end outcomes fail;
- adapter stackA set of adapters loaded together, usually in a defined order. Open glossary definition cannot be reproduced exactly;
- quantized or merged variant has no separate safety evidence;
- fallback route has different policy behavior.
Selection signals
- promotion favors latency, cost, engagement, or satisfaction without source fidelity;
- no-opThe decision not to change the system. Open glossary definition is treated as operational failure;
- user metrics improve while factuality, refusal calibration, or rare-case coverage worsens;
- evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition scores rise but independent reviewers disagree;
- candidates are optimized against visible tests only.
Synthetic feedback signals
- synthetic fraction grows without provenanceA record of where a component or behavior came from. Open glossary definition labels;
- rare examples vanish from evaluation failures;
- outputs become repetitive or stylistically narrow;
- data generated during incidents remains in training material;
- evaluator or benchmark examples are model-generated without independent review.
Action-layer signals
- a model can read untrusted content and write to internal systems in one flow;
- tool-call approval depends on the model’s own explanation;
- credentials are available to components that process external text;
- irreversible actions lack human approval;
- conduct firewallA gate around what the AI can do. Open glossary definition logs are missing or incomplete.
Observability signals
- final output exists but the route is unknown;
- memory writes are not traced;
- evaluator decisions lack version IDs;
- traces cannot be replayed because inputs or tool arguments are missing;
- logs are summarized by the same system under review.
Retirement and rollback signals
- alias changes hide implementation changes;
- rollbackReturning a system to an earlier known state. Open glossary definition restores weights but not memory, route state, evaluator, permissions, or data;
- retired model remains in registry or cache;
- deprecation lacks stakeholder notice;
- behavior returns after the first carrier is removed.
Organizational signals
- responsibility is spread across teams with no accountable behavior owner;
- release pressure overrides unresolved evidence gaps;
- operators cannot explain why a route made a decision;
- reviewers rely on dashboards but not direct traces;
- “temporary” variants remain active after the experiment ends.