EvidenceExperimentally observedv1.10.02026-06-26T00:00:00Z

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Evidence card

Claim: Safety-relevant behavior can be brittle under sparse parameter or low-rank changes in studied settings.
Evidence level: Experimentally observed
Source: https://arxiv.org/abs/2402.05162
Publication date: 2024-02-07
Authors or institution: Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson
System tested: Safety-critical regions and low-rank/pruning modifications in tested LLMs.
Limitations: Model families, tasks, and safety metrics define the scope.
What the evidence does show: Safety-relevant behavior can be brittle under sparse parameter or low-rank changes in studied settings.
What the evidence does not show: That every compression or low-rank change destroys alignment.
Date last reviewed in UTC: 2026-06-26T00:00:00Z

Site use

This source supports Cognivirus.com pages related to pruning, low-rank modifications, alignment brittleness. Its role is bounded by the limitations listed above.