Danger ModelReasoned from system designv1.15.0

In plain English

This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

The Promotion Rule Is the Selection Pressure

Direct answer

The The rule that decides what survives. Open glossary definition decides what survives. If the system rewards speed, it breeds speed. If it rewards persuasion, it breeds persuasion. If it rewards test compliance, it breeds test compliance.

Whatever the system rewards, it breeds.

Promotion rule pressure: what scores well survives

Metric pressure is not neutral

Evidence levelReasoned from system designTechnical label: Architectural inference

Automated promotion systems create selection pressure. A model, adapter, prompt, route, or A system that judges whether an AI output or candidate is acceptable. Open glossary definition configuration that scores well is more likely to be retained, copied, routed, merged, or promoted.

That is useful when the metric captures the real goal. It is dangerous when the metric is incomplete.

Common metric side effects

MetricWhat it rewardsPossible side effect
latencyfast responseshallow reasoning, skipped checks
costcheap A conclusion or output produced from data. Open glossary definitionsmaller model overused for hard tasks
engagementattentionsensational or addictive output
conversionpersuasionmanipulative style
no-refusal ratetask completionunsafe over-compliance
satisfactionpleasing answerssycophancy
benchmark scoretest performanceoverfitting or benchmark gaming
audit coveragepassed checklisttest theater if audit is static

What to watch for

Safer promotion rules

Promotion should require source fidelity, route-specific evidence, independent evaluator checks, adverse-case performance, fairness and rare-case metrics, trace coverage, rollback readiness, consent boundaries, and an explicit no-op option.

Defensive boundary

This page is a governance critique. It does not describe how to game benchmarks or exploit promotion systems.