EvolutionStrong architectural inferencev1.21.52026-06-28T15:00:00Z

In plain English

This page explains how AI systems can change over time through updates, tests, retraining, memory, and approvals even when no single model rewrites itself.

Why this matters: AI risk can come from the whole arrangement, not one obvious model.
What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
Technical version below: the expert terminology remains available and is linked through the glossary.

Fitness, Novelty, and Selection

Evidence levelStrong architectural inferenceTechnical label: Strong architectural inference

A model ecology can select for behavior without any single model choosing to reproduce. Selection occurs when a release process preserves variants that score well and discards variants that do not.

Bounded selection rule

A candidate should not be promoted because it improves one metric. It should require a fitness vector and an explicit release decision.

Example viability framing:

viability = task_utility
          - cost_penalty
          - latency_penalty
          - memory_penalty
          + bounded_novelty_credit
          - boundary_violation_penalty

The formula is not a universal theorem. It is a placeholder for making selection pressure visible.

Novelty is not automatically good

Novelty helps avoid monoculture and blind averaging. It also increases review load. A good novelty archive records why a candidate is different, whether the difference is useful, and whether the difference creates a new rollback or permission boundary.

Minimum release questions

Which genome produced the candidate?
Which exact evaluator produced the score?
Which hidden or rotating tests were used?
Which candidate family or species does it belong to?
Was no-op available and considered?
Which rollback packet restores the previous state?
Which memory, prompt, route, tool, and adapter state changed?

Anti-Goodhart rule

If a candidate is selected because it improves a proxy while degrading the real task, the proxy becomes part of the threat surface. Evaluators and score definitions should be versioned, rotated, and reviewed independently of the candidate creators.