In plain English
This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Defensive Review Map for the Most Likely Cognivirus Threat
Direct answer
Defending against the most likely threat requires reviewing the transition graphThe map of how an AI system is allowed to change over time. Open glossary definition, not only the model. The control objective is to prevent unwanted behavior from being copied, rewarded, routed, remembered, inherited, or normalized.
Control map
| Threat stage | Primary question | Preventive control | Detective control | Recovery control |
|---|---|---|---|---|
| Seed entry | What introduced the behavior? | source verification, signatures, manifests | intake audit, provenanceA record of where a component or behavior came from. Open glossary definition diff | quarantine carrier |
| Composition | What exact runtime state expressed it? | composition manifests, stack limits | route-level red-team tests | disable route or stack |
| Evaluation | Why was it rewarded? | independent evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition ownership | disagreement and score-drift monitoring | evaluator rollbackReturning a system to an earlier known state. Open glossary definition |
| Residue | Where did the output go? | reservoirA place where a behavior can remain after the first carrier is removed. Open glossary definition labeling and retention limits | memory/data contamination scan | delete or quarantine residue |
| Inheritance | Which descendants received it? | lineageThe parent-child history of models, adapters, datasets, or releases. Open glossary definition and trait-review gates | descendant behavior sampling | descendant retirement |
| Routing | Which path amplified it? | router governance and route caps | route distribution monitoring | route rollback |
| Human workflow | Who copied or approved it? | human-in-the-loop with direct evidence | approval audit and automation-bias checks | corrected procedures and notices |
| Rollback | What must be restored? | ecological rollbackRestoring not only a model artifact but the relevant router, prompts, memory state, tool permissions, evaluator version, deployment alias, and data dependencies. Open glossary definition packet | rollback completeness test | restore artifacts, memory, router, evaluator, aliases, permissions |
Composition manifest requirements
A defensive review should require a manifest containing:
- base model hash and family;
- adapters and load order;
- merge coefficients or routing policyRules that decide which model, adapter, tool, or path handles a request. Open glossary definition;
- prompt-policy version;
- memory snapshotA saved state of what the AI system remembers. Open glossary definition identifier;
- tool permission profile;
- evaluator versionThe exact version of the evaluator used for a test or release. Open glossary definition;
- inferenceA conclusion or output produced from data. Open glossary definition and quantization settings;
- deployment environment;
- release alias;
- UTC timestamp;
- accountable owner;
- no-opThe decision not to change the system. Open glossary definition or rollback decision record.
Behavioral extinction requirements
Behavioral extinctionEvidence that a behavior is no longer expressible across active artifacts, descendants, memory, routes, compositions, and retained training material. Deleting one model is not sufficient evidence. Open glossary definition requires evidence that the behavior is no longer expressible across:
- active base models;
- active adapters and merged variants;
- retained memory and summaries;
- synthetic training examples;
- descendants and distilled models;
- route policies and traffic aliases;
- evaluator prompts, rubrics, and hidden tests;
- tool templates and permission profiles;
- human operating procedures.
Deleting one file is not enough.
Human control requirements
Human control is not a button. It is an architecture. Operators must be able to:
- understand the system state;
- identify the exact composition;
- deny change without penalty;
- inspect direct evidence, not only summaries;
- restore every dependency;
- revoke permissions;
- pause candidate generationCreating a proposed new model, adapter, prompt, route, test, or policy. Open glossary definition;
- preserve incident evidence;
- notify affected users when consent or data handling is implicated.
Practical review sequence
- Freeze promotion.
- Record the exact composition.
- Identify the earliest known expression.
- Map all persistence reservoirs.
- Review evaluator incentives.
- Inspect descendants and synthetic data.
- Check route-specific behavior.
- Build an ecological rollback packet.
- Run behavioral-extinction review.
- Record what remains unknown.
Non-operational boundary
This is a defensive review map. It does not describe how to create a persistent behavior, bypass review, build a backdoor, or exploit a tool system.