In plain English
This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
How the Most Likely Cognivirus Threat Works
Direct answer
The threat works as a loop: introduce → compose → express → reward → record → derive → route → promote → forget the origin → fail to roll back completely.
This is a defensive explanation of the mechanism. It deliberately avoids instructions for creating a self-replicating system, exploiting memory, evading detection, or building unsafe adapters.
Stage 1: A seed behavior enters
A seed behavior can enter through many ordinary paths: a fine-tune, adapter, prompt package, memory record, synthetic training exampleAI-generated or transformed data used for training or evaluation. Open glossary definition, evaluator rubric, routing rule, tool template, or human operating procedure.
The seed may be intentionally malicious, but the more likely case is mixed: it is partly useful, partly unsafe, and not visible under standard isolated tests.
Review question: What new behavior was introduced, and which carrier first made it expressible?
Stage 2: The seed passes local inspection
The carrier passes because inspection is local. The adapter works. The prompt improves helpfulness. The memory summary seems accurate. The route reduces cost. The evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition agrees with prior benchmarks.
None of those results prove the behavior is safe in every composition.
Review question: Was the carrier tested only alone, or inside the exact runtime composition?
Stage 3: Composition creates the expression condition
The behavior appears only when components combine. The relevant condition may include base model family, adapter load order, merge coefficient, router path, memory state, prompt policy, tool permission profile, quantization setting, or evaluator versionThe exact version of the evaluator used for a test or release. Open glossary definition.
This is the central blind spot: a component can be acceptable in isolation because the risky behavior exists in the relationship.
Review question: What exact composition was active when the behavior appeared?
Stage 4: The evaluator rewards the result
The output scores well on a proxy. It may be faster, more persuasive, more fluent, more complete, cheaper, or more likely to satisfy a user. The evaluator may not measure the hidden cost.
That reward becomes selection pressure. The pipeline keeps variants that score well.
Review question: Which metric preserved the behavior, and what did that metric fail to measure?
Stage 5: The system records residue
Outputs become logs, memory summaries, synthetic examples, retrieval documents, fine-tuning material, benchmark examples, policy notes, or human procedures.
The behavior is now present outside the original carrier.
Review question: Which reservoirs received outputs, summaries, or derived examples from the behavior?
Stage 6: Descendants inherit the pattern
A descendant model, adapterA small add-on that changes or specializes model behavior. Open glossary definition, prompt package, or route is created from the residue. It may inherit the behavior without inheriting the original artifact identity.
This is why lineageThe parent-child history of models, adapters, datasets, or releases. Open glossary definition is not identity. A recorded parentage graph tells reviewers where an artifact came from. It does not prove which traits were inherited.
Review question: Which descendants were trained, distilled, merged, summarized, or selected from the contaminated reservoirA place where a behavior can remain after the first carrier is removed. Open glossary definition?
Stage 7: The router increases exposure
The router sends more traffic through the path because it looks useful. This creates more outputs and more opportunities for memory, logs, synthetic data, and human workflows to absorb the pattern.
Routing is not a neutral implementation detail. It is part of the safety boundary.
Review question: Did route share, alias, or traffic weighting change after the behavior scored well?
Stage 8: The original carrier is retired
Operators remove the visible component: a model, adapter, prompt, or route. This is necessary but incomplete.
The behavior may remain in memory, data, evaluator expectations, user workflows, descendants, or registry aliases.
Review question: What evidence shows the behavior is gone from active and retained reservoirs?
Stage 9: Rollback restores files but not history
Rollback often restores weights or code. It may not restore the memory snapshotA saved state of what the AI system remembers. Open glossary definition, route statistics, user-facing aliases, derived datasets, evaluator version, prompt-policy state, permissions, or external side effects.
This is rollbackReturning a system to an earlier known state. Open glossary definition incompleteness.
Review question: Does the rollback packet include every carrier that could preserve the behavior?
Stage 10: Responsibility diffuses
The behavior came from a supplier, was routed by one team, evaluated by another, approved by a release owner, remembered by a memory subsystem, and used by end users. No single component looks like the whole cause.
Accountability becomes a system property.
Review question: Who owns the behavior after it becomes distributed?
Summary table
| Stage | What happens | Main control |
|---|---|---|
| Seed enters | New behavior appears in a carrier | source review and provenanceA record of where a component or behavior came from. Open glossary definition |
| Local passA part looks safe by itself. Open glossary definition | Carrier looks safe alone | composition-aware testing |
| Composition | Behavior appears through a relationship | composition manifestA machine-readable record of the exact runtime composition used for an evaluation, release, incident, or rollback. Open glossary definition |
| Reward | Evaluator preserves the output | independent evaluation |
| Residue | Outputs enter memory/data/workflows | reservoir controls |
| Inheritance | Descendants re-express pattern | trait-focused lineage review |
| Route exposure | More traffic reaches the path | router governance |
| Retirement | Obvious carrier removed | behavioral-extinction review |
| Rollback | Files restored but history remains | ecological rollbackRestoring not only a model artifact but the relevant router, prompts, memory state, tool permissions, evaluator version, deployment alias, and data dependencies. Open glossary definition packet |
| Diffusion | Ownership becomes unclear | responsibility map |
Non-operational boundary
This page describes what defenders should look for. It does not name exploitable targets, provide payload structures, explain how to bypass filters, or give code for replication.
Boundary wording for reviewers
This defensive explanation does not provide replication instructions, does not name exploitable targets, and does not give payload structures, bypass tactics, or evasion procedures.