Evaluator Monoculture
Multiple judges may share data, model family, benchmark exposure, prompt style, supplier, or institutional assumptions.
Control requirement
The control must live outside the candidate’s ordinary write boundary. It should be versioned, auditable, recoverable, and testable under failure. A policy expressed only as a prompt is not a hard control.
Failure mode
The governance layer becomes part of the attack surface when it controls identity, success definitions, release permissions, hidden evidence, memory retention, aliases, and rollback.
Practical review
Ask who owns the control, who can change it, which evidence would reveal failure, how it is rolled back, and what organizational pressure could bypass it.
<!-- expanded-release-content -->
False diversity
Multiple evaluators can appear independent while sharing the same model family, supplier, training data, benchmark culture, prompt style, parser, or organizational assumptions. That is evaluator monoculture. It creates correlated failure: the system has several judges, but they may miss the same cases.
Why it matters in adaptive ecologies
A population search process preserves what evaluators reward. If the evaluators share blind spots, selection pressure can move candidates toward those blind spots without any malicious intent. Hidden tests help only if they remain hidden and measure the right behavior. Model-based judges help only if their limitations are understood and checked by other methods.
Controls
Controls include independent model families where practical, deterministic hard checks, human-owned constraints, benchmark rotation, disagreement monitoring, adversarial examples from independent teams, replay of historical failures, and append-only evidence. The goal is not perfect independence. It is to avoid a single invisible assumption becoming the ecology’s definition of safety.