ControlArchitectural inferencev1.10.02026-06-26T00:00:00Z

Who Evaluates the Evaluator?

Evidence levelArchitectural inference

If the evaluator can change, its changes must be reviewed against historical decisions and independent evidence.

Control requirement

The control must live outside the candidate’s ordinary write boundary. It should be versioned, auditable, recoverable, and testable under failure. A policy expressed only as a prompt is not a hard control.

Failure mode

The governance layer becomes part of the attack surface when it controls identity, success definitions, release permissions, hidden evidence, memory retention, aliases, and rollback.

Practical review

Ask who owns the control, who can change it, which evidence would reveal failure, how it is rolled back, and what organizational pressure could bypass it.

The evaluator as a cognitive host

Evidence levelArchitectural inference

An evaluator can carry a pattern just as a model can. It can reward a style, ignore a class of failure, overfit to a benchmark, share training data with candidates, or prefer outputs that match its own assumptions. When an evaluator is model-based, those risks become more concrete because the judge is itself an AI component with prompts, context limits, refusal behavior, and supplier dependencies.

Independence dimensions

Evaluator independence is not a single checkbox. It includes separate credentials, separate storage, hidden-test protection, independent model families where practical, independent suppliers where practical, deterministic validation for hard constraints, append-only evidence, evaluator rollback, and change approval. A judge that shares the same base model family, training corpus, prompt conventions, and benchmark culture as the candidates may be less independent than it appears.

Evidence requirements

Every evaluator version should have a release record: what changed, why it changed, which historical decisions were replayed, which disagreements were observed, which hidden tests were rotated, which hard constraints remained outside model judgment, and which teams approved the change. The evaluator should be testable against known bad cases and known no-op cases.

Hard controls versus opinions

Some decisions should not be delegated to a probabilistic judge. Tool permission boundaries, authority expansion, registry writes, signing-key access, and release approval should use hard controls with explicit human ownership. Model-based evaluators can summarize evidence and detect patterns, but they should not be the sole owners of policy.