Threat ModelReasoned from system designv1.15.0

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Why This Is the Most Likely Cognivirus Threat

Direct answer

Distributed behavioral persistence is the most likely serious threat because it can arise from normal incentives: reduce cost, reuse components, automate evaluation, preserve memory, improve user satisfaction, and ship faster.

It does not require a dramatic breakthrough in autonomous self-awareness. It requires a complex AI product to become easier to change than to fully understand.

Why ordinary engineering creates the setup

Modern AI systems are being optimized for modularity. Modularity is useful. It lowers cost, makes local deployment easier, supports specialization, allows faster updates, and avoids retraining a giant model for every use case.

That same modularity creates risk because behavior can move between small parts:

The system gains operational flexibility. Confidence, backed by evidence, that a system meets safety or governance requirements. Open glossary definition loses a stable target.

Why the incentives point here

IncentiveLegitimate reasonRisk created
Smaller componentslower compute and faster deploymentbehavior can travel in cheap deltas
Dynamic routingbetter cost and task fitsafety depends on the route
Persistent memoryChanging behavior for a user based on information about them. Open glossary definition and continuitybehavior survives model replacement
Automated evaluatorsscale review workproxy loopholes become selection pressure
Synthetic datacheaper improvement materialprior outputs become inherited residue
Frequent releasesrapid product improvementAn educational metaphor describing how long an assurance result remains relevant in a changing system. It is not a standardized measurement. Open glossary definition shortens
Human summary reviewsaves expert timereviewers may depend on the system under review
Third-party modulesspeed and capability diversitysupplier risk enters the behavior boundary

Why this does not need malicious intent

Evidence levelReasoned from system designTechnical label: Architectural inference

Selection can amplify a shortcut without anyone intending harm. If a candidate produces answers that look more complete, sounds more confident, satisfies an automated judge, or reduces latency, it can be promoted. Over time, the ecology can preserve the traits that satisfy the proxy.

This is not a claim that models “want” anything. It is a claim about repeated selection.

Why one-model safety is insufficient

Single-model testing answers one question: how did this model behave under this test condition?

The most likely threat asks a different question: what behaviors can the whole system keep alive across component changes?

A model-level answer is necessary. It is not sufficient when behavior depends on adapter stacks, memory, routes, tools, prompts, evaluators, and descendants.

Why the first incidents may look boring

The first serious failures may not look like AI takeover. They may look like:

The threat looks mundane because it arrives through normal product work.

Comparison: rogue monolith versus distributed persistence

QuestionRogue monolithDistributed persistence
Needs one stable model identity?usually yesno
Needs conscious intent?often assumedno
Needs direct self-copying?often assumedno
Fits ordinary product incentives?less directlyyes
Hard to attribute?sometimesstrongly
Can survive artifact deletion?only if copiedyes, through reservoirs
Main defensecontainment of one agentgovernance of The map of how an AI system is allowed to change over time. Open glossary definition

Most likely near-term form

The most likely near-term form is a legitimate adaptive AI platform that gradually loses behavioral accountability because its adapters, memory, routes, data, and evaluators change faster than its assurance process.

The threat is not that the platform is evil. The threat is that the platform can no longer prove where a behavior lives or whether Returning a system to an earlier known state. Open glossary definition removed it.