Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails Paper • 2603.18280 • Published 28 days ago • 1
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models Paper • 2604.04385 • Published 3 days ago
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models Paper • 2604.04385 • Published 3 days ago
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails Paper • 2603.18280 • Published 28 days ago • 1