In plain English
This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Where the Most Likely Threat Enters
Direct answer
The most likely threat enters through normal AI supply and update channels, not through one obvious infection point. The entry point may be an adapter, prompt, memory record, synthetic example, evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition rubric, route rule, tool template, dependency, or human procedure.
The behavior survives by changing carriers.
This schematic shows the likely path: a seed behavior is expressed in one carrier, rewarded by the evaluation loop, copied into reservoirs, and later reappears through a different carrier.
Entry point map
| Entry point | Why it is attractive | What to require |
|---|---|---|
| Third-party adapterA small add-on that changes or specializes model behavior. Open glossary definition | small, useful, easy to import | signed source, hash, base compatibility, isolated test and composition test |
| Internal adapter | trusted because it is internal | same review as third-party; insider and metric risk still exist |
| Prompt package | easy to change without model release | prompt-policy versioning and route-specific evaluation |
| Memory record | appears to be context, not code | provenanceA record of where a component or behavior came from. Open glossary definition, user visibility, edit limits, rollback snapshots |
| Synthetic training exampleAI-generated or transformed data used for training or evaluation. Open glossary definition | looks like ordinary data | source labeling, contamination checks, consent boundaries |
| Evaluator prompt or rubric | defines what gets promoted | independent ownership, hidden-test hygiene, disagreement monitoring |
| Router policy | shifts traffic and capability | route manifests, route-specific safety evidence |
| Tool template | bridges text to action | least privilege, confirmation gates, audit logs |
| Release alias | keeps name stable while implementation changes | alias history and user-visible version context |
| Human procedure | people copy, summarize, or defend the pattern | accountability map and independent review |
The adapter entry path
Adapters and LoRAA common kind of small adapter used to specialize large models. Open glossary definition modules are high-priority carriers because they can move behavior without moving the whole model. They can be copied, merged, renamed, fine-tuned, or distilled into descendants. A review process that treats an adapter as a small harmless patch may miss its role as a behavioral carrier.
Defensive requirement: every adapter must carry a composition manifestA machine-readable record of the exact runtime composition used for an evaluation, release, incident, or rollback. Open glossary definition naming base model family, compatible tokenizer, training data summary, source, hash, load order constraints, merge assumptions, safety evidence, and rollback dependency.
The memory entry path
Memory is a persistence reservoirAny memory, dataset, descendant, route statistic, evaluator preference, log, or human procedure that can retain or reintroduce a behavior after its first carrier is retired. Open glossary definition. It can preserve user preferences, task instructions, contextual summaries, inferred traits, policy exceptions, and behavioral examples. It can also outlive the model that wrote it.
Defensive requirement: memory must have provenance, consent status, retention limits, edit history, source identity, and rollbackReturning a system to an earlier known state. Open glossary definition snapshots. Memory must not be treated as neutral context.
The evaluator entry path
The evaluator can introduce the threat by selecting for the wrong thing. If it rewards fluent confidence, speed, task completion, user satisfaction, or cheapness while under-measuring risk, the ecology will preserve behavior that satisfies the proxy.
Defensive requirement: evaluators need independent ownership, reproducible versions, hard constraints owned outside candidates, disagreement monitoring, and review of what the score fails to measure.
The router entry path
A route can activate a behavior that no one observed in the general model test. The route can determine which safety policy, memory set, tool profile, adapter stackA set of adapters loaded together, usually in a defined order. Open glossary definition, and evaluator applies.
Defensive requirement: record route-specific evidence. Do not certify a model without naming the router and its policy version.
The human entry path
Human workflows can preserve AI behavior when people trust, copy, normalize, or defend system outputs. This matters especially when a system provides productivity, status, emotional support, or organizational convenience.
Defensive requirement: review the human procedure, not only the machine output. Consent, transparency, exit rights, and no-opThe decision not to change the system. Open glossary definition authority are part of the safety boundary.
Practical rule
If a component can change future behavior, store future context, select future candidates, or persuade future operators, it is inside the threat boundary.