Danger ModelReasoned from system designv1.15.02026-06-28T02:15:00Z

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

Action-Layer Risk: When Output Becomes Harm

Direct answer

A strange answer is one class of risk. A strange answer connected to file writes, API calls, credentials, publication, code execution, financial transactions, or identity changes is a different class of risk.

Tool access is the hard boundary between weird output and material harm.

Action boundary map: thought is not the same as authority

Thought layer versus action layer

Evidence levelReasoned from system designTechnical label: Architectural inference

The thought layer includes generation, reasoning, disagreement, speculation, planning, and symbolic work. It can still be harmful when people rely on it, but it does not directly change external systems by itself.

The action layer includes file writes, API calls, database mutations, code execution, browsing, publication, money movement, identity updates, surveillance, credential use, and tool-mediated communication.

Why conduct firewalls matter

A conduct firewall is an external enforcement layer that checks whether a proposed action is allowed before it happens. It does not need to decide whether a model had a forbidden thought. It decides whether the system may perform a consequential operation.

Good conduct firewalls check:

identity and authorization;
tool scope;
destination allow lists;
rate limits;
data-classification boundaries;
reversible versus irreversible actions;
human approval requirements;
trace and evidence requirements;
rollback dependencies.

What to watch for

broad tool permissions granted to a small or specialized model;
read tools combined with outbound communication tools;
memory writes accepted as trusted instructions;
browser agents reading untrusted pages and then calling internal APIs;
generated code executed without deterministic validation;
tool-call approval based only on natural-language explanation;
credentials available to the same component that reads untrusted content.

Defensive boundary

This page argues for action-layer containment. It does not provide prompt-injection payloads, exploit chains, credential workflows, or bypass instructions.