arxiv:2602.02600
Rom
wrom
AI & ML interests
LLM Security
Recent Activity
upvoted a paper about 1 hour ago
Alignment Makes Language Models Normative, Not Descriptive upvoted a paper 29 days ago
Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software authored a paper about 1 month ago
Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models