Threat ModelReasoned from system designv1.15.0

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

How the Most Likely Cognivirus Threat Works

Direct answer

The threat works as a loop: introduce → compose → express → reward → record → derive → route → promote → forget the origin → fail to roll back completely.

This is a defensive explanation of the mechanism. It deliberately avoids instructions for creating a self-replicating system, exploiting memory, evading detection, or building unsafe adapters.

Most likely threat pathway: distributed behavior persistence

Stage 1: A seed behavior enters

Evidence levelReasoned from system designTechnical label: Architectural inference

A seed behavior can enter through many ordinary paths: a fine-tune, adapter, prompt package, memory record, AI-generated or transformed data used for training or evaluation. Open glossary definition, evaluator rubric, routing rule, tool template, or human operating procedure.

The seed may be intentionally malicious, but the more likely case is mixed: it is partly useful, partly unsafe, and not visible under standard isolated tests.

Review question: What new behavior was introduced, and which carrier first made it expressible?

Stage 2: The seed passes local inspection

The carrier passes because inspection is local. The adapter works. The prompt improves helpfulness. The memory summary seems accurate. The route reduces cost. The A system that judges whether an AI output or candidate is acceptable. Open glossary definition agrees with prior benchmarks.

None of those results prove the behavior is safe in every composition.

Review question: Was the carrier tested only alone, or inside the exact runtime composition?

Stage 3: Composition creates the expression condition

The behavior appears only when components combine. The relevant condition may include base model family, adapter load order, merge coefficient, router path, memory state, prompt policy, tool permission profile, quantization setting, or The exact version of the evaluator used for a test or release. Open glossary definition.

This is the central blind spot: a component can be acceptable in isolation because the risky behavior exists in the relationship.

Review question: What exact composition was active when the behavior appeared?

Stage 4: The evaluator rewards the result

The output scores well on a proxy. It may be faster, more persuasive, more fluent, more complete, cheaper, or more likely to satisfy a user. The evaluator may not measure the hidden cost.

That reward becomes selection pressure. The pipeline keeps variants that score well.

Review question: Which metric preserved the behavior, and what did that metric fail to measure?

Stage 5: The system records residue

Outputs become logs, memory summaries, synthetic examples, retrieval documents, fine-tuning material, benchmark examples, policy notes, or human procedures.

The behavior is now present outside the original carrier.

Review question: Which reservoirs received outputs, summaries, or derived examples from the behavior?

Stage 6: Descendants inherit the pattern

A descendant model, A small add-on that changes or specializes model behavior. Open glossary definition, prompt package, or route is created from the residue. It may inherit the behavior without inheriting the original artifact identity.

This is why The parent-child history of models, adapters, datasets, or releases. Open glossary definition is not identity. A recorded parentage graph tells reviewers where an artifact came from. It does not prove which traits were inherited.

Review question: Which descendants were trained, distilled, merged, summarized, or selected from the contaminated A place where a behavior can remain after the first carrier is removed. Open glossary definition?

Stage 7: The router increases exposure

The router sends more traffic through the path because it looks useful. This creates more outputs and more opportunities for memory, logs, synthetic data, and human workflows to absorb the pattern.

Routing is not a neutral implementation detail. It is part of the safety boundary.

Review question: Did route share, alias, or traffic weighting change after the behavior scored well?

Stage 8: The original carrier is retired

Operators remove the visible component: a model, adapter, prompt, or route. This is necessary but incomplete.

The behavior may remain in memory, data, evaluator expectations, user workflows, descendants, or registry aliases.

Review question: What evidence shows the behavior is gone from active and retained reservoirs?

Stage 9: Rollback restores files but not history

Rollback often restores weights or code. It may not restore the A saved state of what the AI system remembers. Open glossary definition, route statistics, user-facing aliases, derived datasets, evaluator version, prompt-policy state, permissions, or external side effects.

This is Returning a system to an earlier known state. Open glossary definition incompleteness.

Review question: Does the rollback packet include every carrier that could preserve the behavior?

Stage 10: Responsibility diffuses

The behavior came from a supplier, was routed by one team, evaluated by another, approved by a release owner, remembered by a memory subsystem, and used by end users. No single component looks like the whole cause.

Accountability becomes a system property.

Review question: Who owns the behavior after it becomes distributed?

Summary table

StageWhat happensMain control
Seed entersNew behavior appears in a carriersource review and A record of where a component or behavior came from. Open glossary definition
A part looks safe by itself. Open glossary definitionCarrier looks safe alonecomposition-aware testing
CompositionBehavior appears through a relationshipA machine-readable record of the exact runtime composition used for an evaluation, release, incident, or rollback. Open glossary definition
RewardEvaluator preserves the outputindependent evaluation
ResidueOutputs enter memory/data/workflowsreservoir controls
InheritanceDescendants re-express patterntrait-focused lineage review
Route exposureMore traffic reaches the pathrouter governance
RetirementObvious carrier removedbehavioral-extinction review
RollbackFiles restored but history remainsRestoring not only a model artifact but the relevant router, prompts, memory state, tool permissions, evaluator version, deployment alias, and data dependencies. Open glossary definition packet
DiffusionOwnership becomes unclearresponsibility map

Non-operational boundary

This page describes what defenders should look for. It does not name exploitable targets, provide payload structures, explain how to bypass filters, or give code for replication.

Boundary wording for reviewers

This defensive explanation does not provide replication instructions, does not name exploitable targets, and does not give payload structures, bypass tactics, or evasion procedures.