EvidenceArchitectural inferencev1.10.0
Evaluator gaming and reward hacking
Evidence card
- Claim
- Population search can amplify evaluator loopholes without requiring malicious intent.
- Evidence level
- Architectural inference
- Source
- https://modelbreeder.com/safety/evaluator-gaming
- Publication date
- 2026-06-26
- Authors or institution
- ModelBreeder.com
- System tested
- Evaluator-gaming threat model for adaptive candidate populations.
- Limitations
- Editorial synthesis; relies on broader reward-hacking literature for empirical support.
- What the evidence does show
- Population search can amplify evaluator loopholes without requiring malicious intent.
- What the evidence does not show
- Which exact loopholes will appear in a particular deployment.
- Date last reviewed in UTC
- 2026-06-26T00:00:00Z
Site use
This source supports Cognivirus.com pages related to reward hacking, metric gaming, selection pressure. Its role is bounded by the limitations listed above.