ReferenceDemonstratedv1.10.0

Metrics Catalog

MetricDefinitionUseCaveat
Configuration countProduct of available options across base models, adapters, routers, prompt policies, memory states, permission profiles, evaluator versions, and inference settings.Planning test scope.Not every mathematical combination is operationally valid.
Pairwise interaction countSum of products for every pair of configuration dimensions.Approximate lower bound for pairwise coverage planning.Pairwise coverage does not cover higher-order behavior.
Assurance ageElapsed UTC time since the evidence was generated plus material changes since that evidence.Prioritize reevaluation.Time alone is not enough; changes matter more than calendar age.
Rollback completenessFraction of required rollback dependencies with named artifacts, owners, and tested restoration steps.Incident readiness.A complete packet still cannot undo external side effects.
Evaluator diversity indexQualitative count of independent model families, suppliers, corpora, validators, and human review paths in evaluation.Detect evaluator monoculture.Diversity is not independence unless failure modes are actually separated.
Adapter reproduction rateCount of candidate adapters or adapter-derived descendants generated per UTC review window.Detects whether variation is outrunning evaluation capacity.High generation volume is not inherently unsafe if evidence and release controls scale with it.
Reservoir coverage ratioShare of known persistence reservoirs covered by a retirement or behavioral-extinction review.Measures whether rollback considered memory, synthetic data, descendants, routers, evaluators, and aliases.Unknown reservoirs remain outside the denominator.
Evaluator family diversityNumber and independence of evaluator families used for promotion decisions.Detects evaluator monoculture and correlated judge failure.Diversity of labels does not guarantee independence of training data or assumptions.
Reservoir coveragePercentage of known persistence reservoirs explicitly reviewed during retirement or extinction review.Planning behavioral-extinction evidence.A high value does not prove unknown reservoirs do not exist.
No-op preservation rateShare of candidate cycles where no structural change was accepted because gain did not repay cost or evidence was insufficient.Detecting release pressure and no-op erosion.A high rate can mean mature governance or inadequate innovation; context matters.

The metrics catalog provides planning measurements for adaptive model ecologies. These metrics are not certifications. They help teams reason about configuration growth, stale evidence, rollback completeness, evaluator diversity, and persistence reservoirs.

Every metric should be paired with qualitative review. A low count of known components can still hide correlated supplier risk, undocumented memory retention, or unreviewed routing behavior. A high count of configurations does not mean every combination is operationally valid; it means the assurance argument must state which combinations matter and why.