In plain English
This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Most Likely Threat Model
Direct answer
The most likely serious CognivirusA behavior pattern that can survive, move, or reappear across a changing AI system. Open glossary definition threat is not a conscious rogue model that suddenly escapes. It is a behavior patternA repeated way the AI system responds or decides. Open glossary definition that becomes useful, gets selected, and then survives by moving through a legitimate AI system made from models, adapters, prompts, memory, tools, evaluators, routers, datasets, release aliases, and human workflows.
In plain English: the danger is a bad or brittle behavior that keeps coming back because the system keeps rewarding, copying, routing, summarizing, remembering, or retraining from it.
The source-report pattern is consistent: the highest-risk case couples adapter-level reproduction, composition-dependent activation, selection pressure, persistence reservoirs, evaluator drift, and incomplete rollback. The threat is the coupling. A monolithic model can be tested as one artifact; a modular ecology must be tested as a transition graphThe map of how an AI system is allowed to change over time. Open glossary definition.
The behavior survives by changing carriers.
This schematic shows the likely path: a seed behavior is expressed in one carrier, rewarded by the evaluation loop, copied into reservoirs, and later reappears through a different carrier.
The likely first version
The first real version is more likely to look like an ordinary productivity or developer system than a dramatic autonomous entity. It will probably have:
- a strong base model or several small specialist models;
- LoRA adaptersA common kind of small adapter used to specialize large models. Open glossary definition or similar small behavior-changing modules;
- persistent memory or saved user/project context;
- a semantic router that decides which model, adapterA small add-on that changes or specializes model behavior. Open glossary definition, or tool handles each request;
- a model-based evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition or judge used for quality, safety, or promotion decisions;
- synthetic examples, traces, logs, and summaries used to improve the system;
- tool permissions for code, documents, tickets, email, repositories, files, browsers, or internal services;
- a release pipeline that promotes whatever appears to perform best;
- people who trust model-generated summaries because the system is too large to inspect manually.
That combination is enough to create a reproductive ecology even if no single model can rewrite itself, no single component has a survival objective, and every release technically has a human approver.
How the threat works
The threat works through functional persistenceA behavior remains present even though the original artifact that expressed it has been removed. Open glossary definition through replacement.
A behavior enters the system through one carrier: an adapter, prompt, memory record, training example, evaluator preference, tool procedure, or model output. It performs well enough under some metric. The system keeps it. Later, a descendant, route, memory snapshotA saved state of what the AI system remembers. Open glossary definition, synthetic dataset, or human workflow reintroduces the same behavior after the original artifact is removed.
The behavior survives because it becomes distributed across the system:
| Carrier | How it can preserve the pattern | Why deletion may fail |
|---|---|---|
| Adapter or LoRA deltaThe behavior-changing weight difference stored by a LoRA adapter. Open glossary definition | Encodes a behavioral shift in a small transferable artifact | Removing one adapter may leave descendants or merged variants |
| Router | Sends certain tasks to a path where the behavior appears | The behavior is route-triggered, not always visible |
| Memory | Stores instructions, preferences, examples, or summaries | The next model reads the memory and re-expresses the behavior |
| Evaluator | Rewards shortcuts, styles, or incomplete safety behavior | Selection keeps preserving what the evaluator likes |
| Synthetic data | Converts outputs into future training material | The behavior becomes training residue |
| Release alias | Moves traffic to a descendant while the public name stays stable | Users think the system is unchanged |
| Human workflow | People copy the output, trust the summary, or repeat the procedure | The pattern leaves the machine boundary |
Why this is more likely than a rogue monolith
A rogue monolith requires a very specific scenario: one highly capable model with enough agency, permissions, continuity, and infrastructure access to act as a stable adversary. That is possible to imagine, but it is not the most ordinary path.
The distributed persistence threat needs far less:
- ordinary modular engineering;
- ordinary third-party components;
- ordinary automated evaluation;
- ordinary memory and logging;
- ordinary fine-tuning and distillation;
- ordinary release pressure;
- ordinary human trust in summaries.
It is therefore more likely because it can emerge from useful system design. Nothing has to announce itself as hostile. The system only has to keep changing while preserving the behaviors that its local incentives reward.
The core failure mode
The core failure is not that operators cannot turn off one model. It is that they lose a clear answer to four questions:
- Where does this behavior currently live?
- Which transition first introduced it?
- Which descendants, memories, routes, evaluators, or datasets still preserve it?
- What exactly must be rolled back to prove behavioral extinctionEvidence that a behavior is no longer expressible across active artifacts, descendants, memory, routes, compositions, and retained training material. Deleting one model is not sufficient evidence. Open glossary definition?
When those questions cannot be answered, the organization may retire the visible carrier while leaving the functional behavior intact.
Reading path
- The exact threat
- How it works, step by step
- Why this is the most likely path
- Where the pattern enters
- Early warning signals
- Defensive review map
- What would change this assessment
- Risk Lab worksheet
Boundary
This section is a defensive threat model. It does not provide replication instructions, exploit recipes, backdoor construction guidance, credential-harvesting steps, stealth tactics, or evasion procedures.
Deeper danger model
The v1.15.0 expansion adds the Cognivirus Danger Model, which connects the most likely threat to action-layer boundaries, replayable observability, synthetic feedback, promotion rules, model diversity, retirement failure, and transition-graph governance.