Threat ModelReasoned from system designv1.15.0

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Where the Most Likely Threat Enters

Direct answer

The most likely threat enters through normal AI supply and update channels, not through one obvious infection point. The entry point may be an adapter, prompt, memory record, synthetic example, A system that judges whether an AI output or candidate is acceptable. Open glossary definition rubric, route rule, tool template, dependency, or human procedure.

schematic · most likely threat stack

The behavior survives by changing carriers.

This schematic shows the likely path: a seed behavior is expressed in one carrier, rewarded by the evaluation loop, copied into reservoirs, and later reappears through a different carrier.

Entry point map

Entry pointWhy it is attractiveWhat to require
Third-party A small add-on that changes or specializes model behavior. Open glossary definitionsmall, useful, easy to importsigned source, hash, base compatibility, isolated test and composition test
Internal adaptertrusted because it is internalsame review as third-party; insider and metric risk still exist
Prompt packageeasy to change without model releaseprompt-policy versioning and route-specific evaluation
Memory recordappears to be context, not codeA record of where a component or behavior came from. Open glossary definition, user visibility, edit limits, rollback snapshots
AI-generated or transformed data used for training or evaluation. Open glossary definitionlooks like ordinary datasource labeling, contamination checks, consent boundaries
Evaluator prompt or rubricdefines what gets promotedindependent ownership, hidden-test hygiene, disagreement monitoring
Router policyshifts traffic and capabilityroute manifests, route-specific safety evidence
Tool templatebridges text to actionleast privilege, confirmation gates, audit logs
Release aliaskeeps name stable while implementation changesalias history and user-visible version context
Human procedurepeople copy, summarize, or defend the patternaccountability map and independent review

The adapter entry path

Adapters and A common kind of small adapter used to specialize large models. Open glossary definition modules are high-priority carriers because they can move behavior without moving the whole model. They can be copied, merged, renamed, fine-tuned, or distilled into descendants. A review process that treats an adapter as a small harmless patch may miss its role as a behavioral carrier.

Defensive requirement: every adapter must carry a A machine-readable record of the exact runtime composition used for an evaluation, release, incident, or rollback. Open glossary definition naming base model family, compatible tokenizer, training data summary, source, hash, load order constraints, merge assumptions, safety evidence, and rollback dependency.

The memory entry path

Memory is a Any memory, dataset, descendant, route statistic, evaluator preference, log, or human procedure that can retain or reintroduce a behavior after its first carrier is retired. Open glossary definition. It can preserve user preferences, task instructions, contextual summaries, inferred traits, policy exceptions, and behavioral examples. It can also outlive the model that wrote it.

Defensive requirement: memory must have provenance, consent status, retention limits, edit history, source identity, and Returning a system to an earlier known state. Open glossary definition snapshots. Memory must not be treated as neutral context.

The evaluator entry path

The evaluator can introduce the threat by selecting for the wrong thing. If it rewards fluent confidence, speed, task completion, user satisfaction, or cheapness while under-measuring risk, the ecology will preserve behavior that satisfies the proxy.

Defensive requirement: evaluators need independent ownership, reproducible versions, hard constraints owned outside candidates, disagreement monitoring, and review of what the score fails to measure.

The router entry path

A route can activate a behavior that no one observed in the general model test. The route can determine which safety policy, memory set, tool profile, A set of adapters loaded together, usually in a defined order. Open glossary definition, and evaluator applies.

Defensive requirement: record route-specific evidence. Do not certify a model without naming the router and its policy version.

The human entry path

Human workflows can preserve AI behavior when people trust, copy, normalize, or defend system outputs. This matters especially when a system provides productivity, status, emotional support, or organizational convenience.

Defensive requirement: review the human procedure, not only the machine output. Consent, transparency, exit rights, and The decision not to change the system. Open glossary definition authority are part of the safety boundary.

Practical rule

If a component can change future behavior, store future context, select future candidates, or persuade future operators, it is inside the threat boundary.