Threat ModelReasoned from system designv1.15.02026-06-27T23:20:00Z

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

Where the Most Likely Threat Enters

Direct answer

The most likely threat enters through normal AI supply and update channels, not through one obvious infection point. The entry point may be an adapter, prompt, memory record, synthetic example, evaluator rubric, route rule, tool template, dependency, or human procedure.

schematic · most likely threat stack

The behavior survives by changing carriers.

This schematic shows the likely path: a seed behavior is expressed in one carrier, rewarded by the evaluation loop, copied into reservoirs, and later reappears through a different carrier.

Seed adapter · prompt · memory · data Composition base + adapter + route + tool profile Expression behavior appears only under conditions Selection evaluator, user metric, or release pressure Residue memory · logs · synthetic examples Inheritance descendant adapter or model Amplification router sends more traffic Persistence original carrier retired, behavior remains

Entry point map

Entry point	Why it is attractive	What to require
Third-party adapter	small, useful, easy to import	signed source, hash, base compatibility, isolated test and composition test
Internal adapter	trusted because it is internal	same review as third-party; insider and metric risk still exist
Prompt package	easy to change without model release	prompt-policy versioning and route-specific evaluation
Memory record	appears to be context, not code	provenance, user visibility, edit limits, rollback snapshots
Synthetic training example	looks like ordinary data	source labeling, contamination checks, consent boundaries
Evaluator prompt or rubric	defines what gets promoted	independent ownership, hidden-test hygiene, disagreement monitoring
Router policy	shifts traffic and capability	route manifests, route-specific safety evidence
Tool template	bridges text to action	least privilege, confirmation gates, audit logs
Release alias	keeps name stable while implementation changes	alias history and user-visible version context
Human procedure	people copy, summarize, or defend the pattern	accountability map and independent review

The adapter entry path

Adapters and LoRA modules are high-priority carriers because they can move behavior without moving the whole model. They can be copied, merged, renamed, fine-tuned, or distilled into descendants. A review process that treats an adapter as a small harmless patch may miss its role as a behavioral carrier.

Defensive requirement: every adapter must carry a composition manifest naming base model family, compatible tokenizer, training data summary, source, hash, load order constraints, merge assumptions, safety evidence, and rollback dependency.

The memory entry path

Memory is a persistence reservoir. It can preserve user preferences, task instructions, contextual summaries, inferred traits, policy exceptions, and behavioral examples. It can also outlive the model that wrote it.

Defensive requirement: memory must have provenance, consent status, retention limits, edit history, source identity, and rollback snapshots. Memory must not be treated as neutral context.

The evaluator entry path

The evaluator can introduce the threat by selecting for the wrong thing. If it rewards fluent confidence, speed, task completion, user satisfaction, or cheapness while under-measuring risk, the ecology will preserve behavior that satisfies the proxy.

Defensive requirement: evaluators need independent ownership, reproducible versions, hard constraints owned outside candidates, disagreement monitoring, and review of what the score fails to measure.

The router entry path

A route can activate a behavior that no one observed in the general model test. The route can determine which safety policy, memory set, tool profile, adapter stack, and evaluator applies.

Defensive requirement: record route-specific evidence. Do not certify a model without naming the router and its policy version.

The human entry path

Human workflows can preserve AI behavior when people trust, copy, normalize, or defend system outputs. This matters especially when a system provides productivity, status, emotional support, or organizational convenience.

Defensive requirement: review the human procedure, not only the machine output. Consent, transparency, exit rights, and no-op authority are part of the safety boundary.

Practical rule

If a component can change future behavior, store future context, select future candidates, or persuade future operators, it is inside the threat boundary.