EvidenceEmerging evidencev1.10.0
Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
Evidence card
- Claim
- A component can pass isolated inspection while the dangerous behavior exists in a composition state.
- Evidence level
- Emerging evidence
- Source
- https://arxiv.org/abs/2603.12681
- Publication date
- 2026-03-13
- Authors or institution
- Sihao Ding
- System tested
- Composed LoRA adapters that appear benign separately but degrade safety when combined in the studied setup.
- Limitations
- Preprint; scope, models, and exact compositional assumptions need independent replication.
- What the evidence does show
- A component can pass isolated inspection while the dangerous behavior exists in a composition state.
- What the evidence does not show
- That every composed adapter pair colludes or that the phenomenon is inevitable.
- Date last reviewed in UTC
- 2026-06-26T00:00:00Z
Site use
This source supports Cognivirus.com pages related to LoRA composition, composition-triggered vulnerability, safety alignment. Its role is bounded by the limitations listed above.