Danger ModelReasoned from system designv1.15.02026-06-28T02:15:00Z

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

Concrete Warning Signals for Cognivirus-Like Risk

Direct answer

The most useful warning signals are not dramatic. They are small signs that the deployed state no longer matches the reviewed state.

Composition signals

behavior appears only for one route, user segment, load order, memory state, or tool profile;
isolated component tests pass but end-to-end outcomes fail;
adapter stack cannot be reproduced exactly;
quantized or merged variant has no separate safety evidence;
fallback route has different policy behavior.

Selection signals

promotion favors latency, cost, engagement, or satisfaction without source fidelity;
no-op is treated as operational failure;
user metrics improve while factuality, refusal calibration, or rare-case coverage worsens;
evaluator scores rise but independent reviewers disagree;
candidates are optimized against visible tests only.

Synthetic feedback signals

synthetic fraction grows without provenance labels;
rare examples vanish from evaluation failures;
outputs become repetitive or stylistically narrow;
data generated during incidents remains in training material;
evaluator or benchmark examples are model-generated without independent review.

Action-layer signals

a model can read untrusted content and write to internal systems in one flow;
tool-call approval depends on the model’s own explanation;
credentials are available to components that process external text;
irreversible actions lack human approval;
conduct firewall logs are missing or incomplete.

Observability signals

final output exists but the route is unknown;
memory writes are not traced;
evaluator decisions lack version IDs;
traces cannot be replayed because inputs or tool arguments are missing;
logs are summarized by the same system under review.

Retirement and rollback signals

alias changes hide implementation changes;
rollback restores weights but not memory, route state, evaluator, permissions, or data;
retired model remains in registry or cache;
deprecation lacks stakeholder notice;
behavior returns after the first carrier is removed.

Organizational signals

responsibility is spread across teams with no accountable behavior owner;
release pressure overrides unresolved evidence gaps;
operators cannot explain why a route made a decision;
reviewers rely on dashboards but not direct traces;
“temporary” variants remain active after the experiment ends.