In plain English
This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Action-Layer Risk: When Output Becomes Harm
Direct answer
A strange answer is one class of risk. A strange answer connected to file writes, API calls, credentials, publication, code execution, financial transactions, or identity changes is a different class of risk.
Tool access is the hard boundary between weird output and material harm.
Thought layer versus action layer
The thought layer includes generation, reasoning, disagreement, speculation, planning, and symbolic work. It can still be harmful when people rely on it, but it does not directly change external systems by itself.
The action layer includes file writes, API calls, database mutations, code execution, browsing, publication, money movement, identity updates, surveillance, credential use, and tool-mediated communication.
Why conduct firewalls matter
A conduct firewallA gate around what the AI can do. Open glossary definition is an external enforcement layer that checks whether a proposed action is allowed before it happens. It does not need to decide whether a model had a forbidden thought. It decides whether the system may perform a consequential operation.
Good conduct firewalls check:
- identity and authorization;
- tool scope;
- destination allow lists;
- rate limits;
- data-classification boundaries;
- reversible versus irreversible actions;
- human approval requirements;
- trace and evidence requirements;
- rollbackReturning a system to an earlier known state. Open glossary definition dependencies.
What to watch for
- broad tool permissions granted to a small or specialized model;
- read tools combined with outbound communication tools;
- memory writes accepted as trusted instructions;
- browser agents reading untrusted pages and then calling internal APIs;
- generated code executed without deterministic validation;
- tool-call approval based only on natural-language explanation;
- credentials available to the same component that reads untrusted content.
Defensive boundary
This page argues for action-layer containment. It does not provide prompt-injection payloads, exploit chains, credential workflows, or bypass instructions.