EvidenceExperimentally observedv1.10.0

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Evidence card

Claim
Safety-relevant behavior can be brittle under sparse parameter or low-rank changes in studied settings.
Evidence level
Experimentally observed
Source
https://arxiv.org/abs/2402.05162
Publication date
2024-02-07
Authors or institution
Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson
System tested
Safety-critical regions and low-rank/pruning modifications in tested LLMs.
Limitations
Model families, tasks, and safety metrics define the scope.
What the evidence does show
Safety-relevant behavior can be brittle under sparse parameter or low-rank changes in studied settings.
What the evidence does not show
That every compression or low-rank change destroys alignment.
Date last reviewed in UTC
2026-06-26T00:00:00Z

Site use

This source supports Cognivirus.com pages related to pruning, low-rank modifications, alignment brittleness. Its role is bounded by the limitations listed above.