EvidenceExperimentally observedv1.10.0

Adversaries Can Misuse Combinations of Safe Models

Evidence card

Claim
Testing each model in isolation can miss misuse enabled by decomposing a task across models.
Evidence level
Experimentally observed
Source
https://proceedings.mlr.press/v267/jones25a.html
Publication date
2025-07-13
Authors or institution
Erik Jones, Anca Dragan, Jacob Steinhardt
System tested
Combinations of individually safer models across misuse-oriented task decompositions.
Limitations
Specific tasks, models, and decomposition methods; not a universal result for every system.
What the evidence does show
Testing each model in isolation can miss misuse enabled by decomposing a task across models.
What the evidence does not show
That all model combinations are unsafe or that safe frontier models directly produce harmful output.
Date last reviewed in UTC
2026-06-26T00:00:00Z

Site use

This source supports Cognivirus.com pages related to model combinations, misuse, task decomposition, red-teaming. Its role is bounded by the limitations listed above.