Danger ModelReasoned from system designv1.15.0

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Action-Layer Risk: When Output Becomes Harm

Direct answer

A strange answer is one class of risk. A strange answer connected to file writes, API calls, credentials, publication, code execution, financial transactions, or identity changes is a different class of risk.

Tool access is the hard boundary between weird output and material harm.

Action boundary map: thought is not the same as authority

Thought layer versus action layer

Evidence levelReasoned from system designTechnical label: Architectural inference

The thought layer includes generation, reasoning, disagreement, speculation, planning, and symbolic work. It can still be harmful when people rely on it, but it does not directly change external systems by itself.

The action layer includes file writes, API calls, database mutations, code execution, browsing, publication, money movement, identity updates, surveillance, credential use, and tool-mediated communication.

Why conduct firewalls matter

A A gate around what the AI can do. Open glossary definition is an external enforcement layer that checks whether a proposed action is allowed before it happens. It does not need to decide whether a model had a forbidden thought. It decides whether the system may perform a consequential operation.

Good conduct firewalls check:

What to watch for

Defensive boundary

This page argues for action-layer containment. It does not provide prompt-injection payloads, exploit chains, credential workflows, or bypass instructions.