EvidenceExperimentally observedv1.10.02026-06-26T00:00:00Z

Adversaries Can Misuse Combinations of Safe Models

Evidence card

Claim: Testing each model in isolation can miss misuse enabled by decomposing a task across models.
Evidence level: Experimentally observed
Source: https://proceedings.mlr.press/v267/jones25a.html
Publication date: 2025-07-13
Authors or institution: Erik Jones, Anca Dragan, Jacob Steinhardt
System tested: Combinations of individually safer models across misuse-oriented task decompositions.
Limitations: Specific tasks, models, and decomposition methods; not a universal result for every system.
What the evidence does show: Testing each model in isolation can miss misuse enabled by decomposing a task across models.
What the evidence does not show: That all model combinations are unsafe or that safe frontier models directly produce harmful output.
Date last reviewed in UTC: 2026-06-26T00:00:00Z

Site use

This source supports Cognivirus.com pages related to model combinations, misuse, task decomposition, red-teaming. Its role is bounded by the limitations listed above.