
# Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

**Source:** https://arxiv.org/abs/2402.05162  
**Authors or institution:** Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson  
**Publication date:** 2024-02-07  
**Publication status:** ICML 2024 / PMLR; arXiv metadata available  
**Evidence level:** Experimentally observed  
**Date last reviewed in UTC:** 2026-06-26T00:00:00Z

## Direct findings or source content

Safety-relevant behavior can be brittle under sparse parameter or low-rank changes in studied settings.

## Cognivirus interpretation

For Cognivirus.com, this source is used to examine risk at the level of adaptive systems, component compositions, evaluator boundaries, and behavioral persistence. The site interpretation is narrower than the source when the source is experimental, and more explicitly qualified when the source is architectural or programmatic.

## Limits

Model families, tasks, and safety metrics define the scope. That every compression or low-rank change destroys alignment.

## Source handling

This local file is an original summary and metadata record. It is not a copy of the source paper, report, or website. Copyrighted source material is not reproduced in full.
