EvidenceExperimentally observedv1.10.0
Adversaries Can Misuse Combinations of Safe Models
Evidence card
- Claim
- Testing each model in isolation can miss misuse enabled by decomposing a task across models.
- Evidence level
- Experimentally observed
- Source
- https://proceedings.mlr.press/v267/jones25a.html
- Publication date
- 2025-07-13
- Authors or institution
- Erik Jones, Anca Dragan, Jacob Steinhardt
- System tested
- Combinations of individually safer models across misuse-oriented task decompositions.
- Limitations
- Specific tasks, models, and decomposition methods; not a universal result for every system.
- What the evidence does show
- Testing each model in isolation can miss misuse enabled by decomposing a task across models.
- What the evidence does not show
- That all model combinations are unsafe or that safe frontier models directly produce harmful output.
- Date last reviewed in UTC
- 2026-06-26T00:00:00Z
Site use
This source supports Cognivirus.com pages related to model combinations, misuse, task decomposition, red-teaming. Its role is bounded by the limitations listed above.