In plain English
This page is part of the technical reference. It keeps the expert detail but starts with a plain-language summary for first-time readers.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Synthetic Feedback Loops and Model Collapse Risk
Direct answer
A synthetic feedback loopAI output becoming future AI input. Open glossary definition occurs when AI outputs become future training data, memory, examples, or evaluator material. The risk is not synthetic data itself. The risk is unmanaged recursion.
What can go wrong
When models repeatedly learn from model-generated data without provenanceA record of where a component or behavior came from. Open glossary definition, fresh human data, diversity checks, and quality filtering, the system can lose rare examples, amplify mistakes, smooth out minority cases, and produce more generic outputs over time.
This matters to Cognivirus.com because synthetic feedback can act as a persistence reservoirAny memory, dataset, descendant, route statistic, evaluator preference, log, or human procedure that can retain or reintroduce a behavior after its first carrier is retired. Open glossary definition. A behavior can survive because it becomes an example, a summary, a benchmark case, or a training record.
Plain-English analogy
If every new map is copied from the previous map, small mistakes can become permanent geography. If no one checks the land again, the map becomes a self-confirming world.
What to watch for
- training data with unknown human-vs-AI provenance;
- rising fraction of synthetic examples;
- output repetition or narrowing style;
- reduced performance on rare or edge cases;
- new fairness gaps;
- “successful” outputs automatically saved for fine-tuning;
- evaluatorA system that judges whether an AI output or candidate is acceptable. Open glossary definition examples generated by the same model family being evaluated;
- synthetic examples derived from incident-era outputs.
Controls
- label human, synthetic, and mixed sources;
- keep fresh human-reviewed data in the loop;
- quarantine incident-era outputs;
- track rare-case performance separately;
- measure output diversity;
- prevent evaluator self-training without independent review;
- require dataset retirement and cleanup as part of behavioral-extinction review.
Evidence boundary
Model collapseA model losing diversity by learning from its own outputs. Open glossary definition is a documented research concern under particular training regimes. This page does not claim that every use of synthetic data causes collapse, or that all current systems are collapsing. It treats unmanaged recursion as a risk that needs governance.