ReferenceDemonstratedv1.10.0
Metrics Catalog
| Metric | Definition | Use | Caveat |
|---|---|---|---|
| Configuration count | Product of available options across base models, adapters, routers, prompt policies, memory states, permission profiles, evaluator versions, and inference settings. | Planning test scope. | Not every mathematical combination is operationally valid. |
| Pairwise interaction count | Sum of products for every pair of configuration dimensions. | Approximate lower bound for pairwise coverage planning. | Pairwise coverage does not cover higher-order behavior. |
| Assurance age | Elapsed UTC time since the evidence was generated plus material changes since that evidence. | Prioritize reevaluation. | Time alone is not enough; changes matter more than calendar age. |
| Rollback completeness | Fraction of required rollback dependencies with named artifacts, owners, and tested restoration steps. | Incident readiness. | A complete packet still cannot undo external side effects. |
| Evaluator diversity index | Qualitative count of independent model families, suppliers, corpora, validators, and human review paths in evaluation. | Detect evaluator monoculture. | Diversity is not independence unless failure modes are actually separated. |
| Adapter reproduction rate | Count of candidate adapters or adapter-derived descendants generated per UTC review window. | Detects whether variation is outrunning evaluation capacity. | High generation volume is not inherently unsafe if evidence and release controls scale with it. |
| Reservoir coverage ratio | Share of known persistence reservoirs covered by a retirement or behavioral-extinction review. | Measures whether rollback considered memory, synthetic data, descendants, routers, evaluators, and aliases. | Unknown reservoirs remain outside the denominator. |
| Evaluator family diversity | Number and independence of evaluator families used for promotion decisions. | Detects evaluator monoculture and correlated judge failure. | Diversity of labels does not guarantee independence of training data or assumptions. |
| Reservoir coverage | Percentage of known persistence reservoirs explicitly reviewed during retirement or extinction review. | Planning behavioral-extinction evidence. | A high value does not prove unknown reservoirs do not exist. |
| No-op preservation rate | Share of candidate cycles where no structural change was accepted because gain did not repay cost or evidence was insufficient. | Detecting release pressure and no-op erosion. | A high rate can mean mature governance or inadequate innovation; context matters. |
The metrics catalog provides planning measurements for adaptive model ecologies. These metrics are not certifications. They help teams reason about configuration growth, stale evidence, rollback completeness, evaluator diversity, and persistence reservoirs.
Every metric should be paired with qualitative review. A low count of known components can still hide correlated supplier risk, undocumented memory retention, or unreviewed routing behavior. A high count of configurations does not mean every combination is operationally valid; it means the assurance argument must state which combinations matter and why.