
# SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

**Source:** https://arxiv.org/abs/2503.17239  
**Authors or institution:** Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche  
**Publication date:** 2025-03-21  
**Publication status:** arXiv preprint; ICLR 2025 workshop metadata available  
**Evidence level:** Emerging evidence  
**Date last reviewed in UTC:** 2026-06-26T00:00:00Z

## Direct findings or source content

Safety-preserving post-fine-tuning methods are being studied because benign fine-tuning can erode safety.

## Cognivirus interpretation

For Cognivirus.com, this source is used to examine risk at the level of adaptive systems, component compositions, evaluator boundaries, and behavioral persistence. The site interpretation is narrower than the source when the source is experimental, and more explicitly qualified when the source is architectural or programmatic.

## Limits

A mitigation proposal; effectiveness depends on models, tasks, metrics, and implementation. That SafeMERGE or any single method solves descendant safety inheritance.

## Source handling

This local file is an original summary and metadata record. It is not a copy of the source paper, report, or website. Copyrighted source material is not reproduced in full.
