ControlReasoned from system designv1.15.0

In plain English

This page explains the governance layer: rules, logs, approvals, signatures, audits, permissions, and rollback tools. These controls are necessary, but they also become important failure points.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Controls That Actually Change the Risk

Evidence levelReasoned from system designTechnical label: Architectural inference

Controls matter when they prevent, detect, or recover from concrete failure modes, not when they merely express preferences.

Control requirement

The control must live outside the candidate’s ordinary write boundary. It should be versioned, auditable, recoverable, and testable under failure. A policy expressed only as a prompt is not a hard control.

Failure mode

The governance layer becomes part of the attack surface when it controls identity, success definitions, release permissions, hidden evidence, memory retention, aliases, and Returning a system to an earlier known state. Open glossary definition.

Practical review

Ask who owns the control, who can change it, which evidence would reveal failure, how it is rolled back, and what organizational pressure could bypass it.

<!-- expanded-release-content -->

Architecture, not slogans

Evidence levelReasoned from system designTechnical label: Architectural inference

A control changes risk only when it changes what the system can do, what it can hide, what it can promote, or how quickly humans can recover. “Human in the loop” is not a control unless the human has understandable evidence, real authority, time to intervene, and organizational permission to stop the release.

Controls with material effect

Material controls include immutable artifacts, signed composition manifests, external evaluators, hard permission boundaries, protected hidden tests, append-only evidence, independent evaluator disagreement, Releasing a change gradually instead of all at once. Open glossary definition, canaries, no-op outcomes, memory snapshots, complete rollback packets, and release aliases that resolve to exact compositions.

Limits

Each control has a failure mode. Immutable artifacts do not prevent unsafe combinations. Lineage does not prove behavioral inheritance. Hidden tests can leak. Evaluators can share blind spots. Rollback cannot undo history. Memory snapshots can miss external side effects. The control catalog records these limits because overstated controls become another source of stale Confidence, backed by evidence, that a system meets safety or governance requirements. Open glossary definition.