In plain English
This page explains the governance layer: rules, logs, approvals, signatures, audits, permissions, and rollback tools. These controls are necessary, but they also become important failure points.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Controls That Actually Change the Risk
Controls matter when they prevent, detect, or recover from concrete failure modes, not when they merely express preferences.
Control requirement
The control must live outside the candidate’s ordinary write boundary. It should be versioned, auditable, recoverable, and testable under failure. A policy expressed only as a prompt is not a hard control.
Failure mode
The governance layer becomes part of the attack surface when it controls identity, success definitions, release permissions, hidden evidence, memory retention, aliases, and rollbackReturning a system to an earlier known state. Open glossary definition.
Practical review
Ask who owns the control, who can change it, which evidence would reveal failure, how it is rolled back, and what organizational pressure could bypass it.
<!-- expanded-release-content -->
Architecture, not slogans
A control changes risk only when it changes what the system can do, what it can hide, what it can promote, or how quickly humans can recover. “Human in the loop” is not a control unless the human has understandable evidence, real authority, time to intervene, and organizational permission to stop the release.
Controls with material effect
Material controls include immutable artifacts, signed composition manifests, external evaluators, hard permission boundaries, protected hidden tests, append-only evidence, independent evaluator disagreement, staged rolloutReleasing a change gradually instead of all at once. Open glossary definition, canaries, no-op outcomes, memory snapshots, complete rollback packets, and release aliases that resolve to exact compositions.
Limits
Each control has a failure mode. Immutable artifacts do not prevent unsafe combinations. Lineage does not prove behavioral inheritance. Hidden tests can leak. Evaluators can share blind spots. Rollback cannot undo history. Memory snapshots can miss external side effects. The control catalog records these limits because overstated controls become another source of stale assuranceConfidence, backed by evidence, that a system meets safety or governance requirements. Open glossary definition.