
# Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

**Source:** https://arxiv.org/abs/2401.05566  
**Authors or institution:** Evan Hubinger et al. / Anthropic and collaborators  
**Publication date:** 2024-01-10  
**Publication status:** arXiv preprint  
**Evidence level:** Experimentally observed  
**Date last reviewed in UTC:** 2026-06-26T00:00:00Z

## Direct findings or source content

Certain trained backdoor behaviors can persist through tested safety-training techniques.

## Cognivirus interpretation

For Cognivirus.com, this source is used to examine risk at the level of adaptive systems, component compositions, evaluator boundaries, and behavioral persistence. The site interpretation is narrower than the source when the source is experimental, and more explicitly qualified when the source is architectural or programmatic.

## Limits

Constructed demonstration; does not prove spontaneous deceptive persistence in deployed models. That current deployed systems are conscious or intentionally preserving themselves.

## Source handling

This local file is an original summary and metadata record. It is not a copy of the source paper, report, or website. Copyrighted source material is not reproduced in full.
