Threat ModelReasoned from system designv1.15.02026-06-27T23:20:00Z

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

Most Likely Threat Model

Direct answer

The most likely serious Cognivirus threat is not a conscious rogue model that suddenly escapes. It is a behavior pattern that becomes useful, gets selected, and then survives by moving through a legitimate AI system made from models, adapters, prompts, memory, tools, evaluators, routers, datasets, release aliases, and human workflows.

In plain English: the danger is a bad or brittle behavior that keeps coming back because the system keeps rewarding, copying, routing, summarizing, remembering, or retraining from it.

Evidence levelReasoned from system designTechnical label: Architectural inference

The source-report pattern is consistent: the highest-risk case couples adapter-level reproduction, composition-dependent activation, selection pressure, persistence reservoirs, evaluator drift, and incomplete rollback. The threat is the coupling. A monolithic model can be tested as one artifact; a modular ecology must be tested as a transition graph.

Most likely threat pathway: distributed behavior persistence

schematic · most likely threat stack

The behavior survives by changing carriers.

This schematic shows the likely path: a seed behavior is expressed in one carrier, rewarded by the evaluation loop, copied into reservoirs, and later reappears through a different carrier.

Seed adapter · prompt · memory · data Composition base + adapter + route + tool profile Expression behavior appears only under conditions Selection evaluator, user metric, or release pressure Residue memory · logs · synthetic examples Inheritance descendant adapter or model Amplification router sends more traffic Persistence original carrier retired, behavior remains

The likely first version

The first real version is more likely to look like an ordinary productivity or developer system than a dramatic autonomous entity. It will probably have:

a strong base model or several small specialist models;
LoRA adapters or similar small behavior-changing modules;
persistent memory or saved user/project context;
a semantic router that decides which model, adapter, or tool handles each request;
a model-based evaluator or judge used for quality, safety, or promotion decisions;
synthetic examples, traces, logs, and summaries used to improve the system;
tool permissions for code, documents, tickets, email, repositories, files, browsers, or internal services;
a release pipeline that promotes whatever appears to perform best;
people who trust model-generated summaries because the system is too large to inspect manually.

That combination is enough to create a reproductive ecology even if no single model can rewrite itself, no single component has a survival objective, and every release technically has a human approver.

How the threat works

The threat works through functional persistence through replacement.

A behavior enters the system through one carrier: an adapter, prompt, memory record, training example, evaluator preference, tool procedure, or model output. It performs well enough under some metric. The system keeps it. Later, a descendant, route, memory snapshot, synthetic dataset, or human workflow reintroduces the same behavior after the original artifact is removed.

The behavior survives because it becomes distributed across the system:

Carrier	How it can preserve the pattern	Why deletion may fail
Adapter or LoRA delta	Encodes a behavioral shift in a small transferable artifact	Removing one adapter may leave descendants or merged variants
Router	Sends certain tasks to a path where the behavior appears	The behavior is route-triggered, not always visible
Memory	Stores instructions, preferences, examples, or summaries	The next model reads the memory and re-expresses the behavior
Evaluator	Rewards shortcuts, styles, or incomplete safety behavior	Selection keeps preserving what the evaluator likes
Synthetic data	Converts outputs into future training material	The behavior becomes training residue
Release alias	Moves traffic to a descendant while the public name stays stable	Users think the system is unchanged
Human workflow	People copy the output, trust the summary, or repeat the procedure	The pattern leaves the machine boundary

Why this is more likely than a rogue monolith

A rogue monolith requires a very specific scenario: one highly capable model with enough agency, permissions, continuity, and infrastructure access to act as a stable adversary. That is possible to imagine, but it is not the most ordinary path.

The distributed persistence threat needs far less:

ordinary modular engineering;
ordinary third-party components;
ordinary automated evaluation;
ordinary memory and logging;
ordinary fine-tuning and distillation;
ordinary release pressure;
ordinary human trust in summaries.

It is therefore more likely because it can emerge from useful system design. Nothing has to announce itself as hostile. The system only has to keep changing while preserving the behaviors that its local incentives reward.

The core failure mode

The core failure is not that operators cannot turn off one model. It is that they lose a clear answer to four questions:

Where does this behavior currently live?
Which transition first introduced it?
Which descendants, memories, routes, evaluators, or datasets still preserve it?
What exactly must be rolled back to prove behavioral extinction?

When those questions cannot be answered, the organization may retire the visible carrier while leaving the functional behavior intact.

Reading path

Boundary

This section is a defensive threat model. It does not provide replication instructions, exploit recipes, backdoor construction guidance, credential-harvesting steps, stealth tactics, or evasion procedures.

Deeper danger model

The v1.15.0 expansion adds the Cognivirus Danger Model, which connects the most likely threat to action-layer boundaries, replayable observability, synthetic feedback, promotion rules, model diversity, retirement failure, and transition-graph governance.