Threat ModelReasoned from system designv1.15.02026-06-27T23:20:00Z

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

How the Most Likely Cognivirus Threat Works

Direct answer

The threat works as a loop: introduce → compose → express → reward → record → derive → route → promote → forget the origin → fail to roll back completely.

This is a defensive explanation of the mechanism. It deliberately avoids instructions for creating a self-replicating system, exploiting memory, evading detection, or building unsafe adapters.

Most likely threat pathway: distributed behavior persistence

Stage 1: A seed behavior enters

Evidence levelReasoned from system designTechnical label: Architectural inference

A seed behavior can enter through many ordinary paths: a fine-tune, adapter, prompt package, memory record, synthetic training example, evaluator rubric, routing rule, tool template, or human operating procedure.

The seed may be intentionally malicious, but the more likely case is mixed: it is partly useful, partly unsafe, and not visible under standard isolated tests.

Review question: What new behavior was introduced, and which carrier first made it expressible?

Stage 2: The seed passes local inspection

The carrier passes because inspection is local. The adapter works. The prompt improves helpfulness. The memory summary seems accurate. The route reduces cost. The evaluator agrees with prior benchmarks.

None of those results prove the behavior is safe in every composition.

Review question: Was the carrier tested only alone, or inside the exact runtime composition?

Stage 3: Composition creates the expression condition

The behavior appears only when components combine. The relevant condition may include base model family, adapter load order, merge coefficient, router path, memory state, prompt policy, tool permission profile, quantization setting, or evaluator version.

This is the central blind spot: a component can be acceptable in isolation because the risky behavior exists in the relationship.

Review question: What exact composition was active when the behavior appeared?

Stage 4: The evaluator rewards the result

The output scores well on a proxy. It may be faster, more persuasive, more fluent, more complete, cheaper, or more likely to satisfy a user. The evaluator may not measure the hidden cost.

That reward becomes selection pressure. The pipeline keeps variants that score well.

Review question: Which metric preserved the behavior, and what did that metric fail to measure?

Stage 5: The system records residue

Outputs become logs, memory summaries, synthetic examples, retrieval documents, fine-tuning material, benchmark examples, policy notes, or human procedures.

The behavior is now present outside the original carrier.

Review question: Which reservoirs received outputs, summaries, or derived examples from the behavior?

Stage 6: Descendants inherit the pattern

A descendant model, adapter, prompt package, or route is created from the residue. It may inherit the behavior without inheriting the original artifact identity.

This is why lineage is not identity. A recorded parentage graph tells reviewers where an artifact came from. It does not prove which traits were inherited.

Review question: Which descendants were trained, distilled, merged, summarized, or selected from the contaminated reservoir?

Stage 7: The router increases exposure

The router sends more traffic through the path because it looks useful. This creates more outputs and more opportunities for memory, logs, synthetic data, and human workflows to absorb the pattern.

Routing is not a neutral implementation detail. It is part of the safety boundary.

Review question: Did route share, alias, or traffic weighting change after the behavior scored well?

Stage 8: The original carrier is retired

Operators remove the visible component: a model, adapter, prompt, or route. This is necessary but incomplete.

The behavior may remain in memory, data, evaluator expectations, user workflows, descendants, or registry aliases.

Review question: What evidence shows the behavior is gone from active and retained reservoirs?

Stage 9: Rollback restores files but not history

Rollback often restores weights or code. It may not restore the memory snapshot, route statistics, user-facing aliases, derived datasets, evaluator version, prompt-policy state, permissions, or external side effects.

This is rollback incompleteness.

Review question: Does the rollback packet include every carrier that could preserve the behavior?

Stage 10: Responsibility diffuses

The behavior came from a supplier, was routed by one team, evaluated by another, approved by a release owner, remembered by a memory subsystem, and used by end users. No single component looks like the whole cause.

Accountability becomes a system property.

Review question: Who owns the behavior after it becomes distributed?

Summary table

Stage	What happens	Main control
Seed enters	New behavior appears in a carrier	source review and provenance
Local pass	Carrier looks safe alone	composition-aware testing
Composition	Behavior appears through a relationship	composition manifest
Reward	Evaluator preserves the output	independent evaluation
Residue	Outputs enter memory/data/workflows	reservoir controls
Inheritance	Descendants re-express pattern	trait-focused lineage review
Route exposure	More traffic reaches the path	router governance
Retirement	Obvious carrier removed	behavioral-extinction review
Rollback	Files restored but history remains	ecological rollback packet
Diffusion	Ownership becomes unclear	responsibility map

Non-operational boundary

This page describes what defenders should look for. It does not name exploitable targets, provide payload structures, explain how to bypass filters, or give code for replication.

Boundary wording for reviewers

This defensive explanation does not provide replication instructions, does not name exploitable targets, and does not give payload structures, bypass tactics, or evasion procedures.