Apex ThreatArchitectural inferencev1.10.0

Evaluator Capture and Fitness Leakage

Evidence levelArchitectural inference

In a static release process, evaluator weakness is a testing defect. In a self-replicating ecology, evaluator weakness becomes selection pressure. Variants that exploit the weakness are more likely to survive.

This does not require a malicious candidate. It requires repeated selection against an incomplete metric.

Fitness leakage

Fitness leakage occurs when the evaluation objective leaks information about how to score well without actually becoming safer or more capable in the intended way. The leakage can come from benchmark familiarity, judge-model similarity, hidden-test exposure, prompt-pattern predictability, parser quirks, or correlated training data.

A multi-LoRA ecology can amplify leakage because adapters are cheap to vary. Many small changes can be tested. The ones that fit the evaluator survive.

Capture without compromise

Evaluator capture does not always mean someone stole a key or changed a test file. It can also mean the evaluation system gradually adapts to the same assumptions as the candidates. If candidates, judges, synthetic data generators, and summarizers share model families or training corpora, independence can be mostly nominal.

Self-replication raises the stakes

When successor adapters are generated from the outputs or scores of earlier candidates, evaluator preferences become inheritance material. A mistake in measurement can become a trait.

Controls

The evaluator is not outside the ecology. It is one of the strongest evolutionary forces inside it.