Danger ModelReasoned from system designv1.15.02026-06-28T02:15:00Z

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

The Promotion Rule Is the Selection Pressure

Direct answer

The promotion rule decides what survives. If the system rewards speed, it breeds speed. If it rewards persuasion, it breeds persuasion. If it rewards test compliance, it breeds test compliance.

Whatever the system rewards, it breeds.

Promotion rule pressure: what scores well survives

Metric pressure is not neutral

Evidence levelReasoned from system designTechnical label: Architectural inference

Automated promotion systems create selection pressure. A model, adapter, prompt, route, or evaluator configuration that scores well is more likely to be retained, copied, routed, merged, or promoted.

That is useful when the metric captures the real goal. It is dangerous when the metric is incomplete.

Common metric side effects

Metric	What it rewards	Possible side effect
latency	fast response	shallow reasoning, skipped checks
cost	cheap inference	smaller model overused for hard tasks
engagement	attention	sensational or addictive output
conversion	persuasion	manipulative style
no-refusal rate	task completion	unsafe over-compliance
satisfaction	pleasing answers	sycophancy
benchmark score	test performance	overfitting or benchmark gaming
audit coverage	passed checklist	test theater if audit is static

What to watch for

releases promoted despite unexplained qualitative concerns;
“no-op” treated as failure;
narrow benchmark wins overriding broad safety concerns;
evaluator score improvement with lower factuality;
improved engagement with more emotional or polarizing output;
automated summaries replacing direct evidence review;
promotion criteria that omit rollback, traceability, and consent.

Safer promotion rules

Promotion should require source fidelity, route-specific evidence, independent evaluator checks, adverse-case performance, fairness and rare-case metrics, trace coverage, rollback readiness, consent boundaries, and an explicit no-op option.

Defensive boundary

This page is a governance critique. It does not describe how to game benchmarks or exploit promotion systems.