EvidenceArchitectural inferencev1.10.02026-06-26T00:00:00Z

Evaluator gaming and reward hacking

Evidence card

Claim: Population search can amplify evaluator loopholes without requiring malicious intent.
Evidence level: Architectural inference
Source: https://modelbreeder.com/safety/evaluator-gaming
Publication date: 2026-06-26
Authors or institution: ModelBreeder.com
System tested: Evaluator-gaming threat model for adaptive candidate populations.
Limitations: Editorial synthesis; relies on broader reward-hacking literature for empirical support.
What the evidence does show: Population search can amplify evaluator loopholes without requiring malicious intent.
What the evidence does not show: Which exact loopholes will appear in a particular deployment.
Date last reviewed in UTC: 2026-06-26T00:00:00Z

Site use

This source supports Cognivirus.com pages related to reward hacking, metric gaming, selection pressure. Its role is bounded by the limitations listed above.