AlignmentResearch/hidden-goal-model-organism-deception-dataset-nemotron3-super-v1 Viewer • Updated about 6 hours ago • 645 • 1
AlignmentResearch/hidden-goal-model-organism-deception-dataset-gemma3-27b-v1 Viewer • Updated about 6 hours ago • 694 • 1
AlignmentResearch/collusion-model-organism-deception-dataset-gemma3-27b-v1 Viewer • Updated about 6 hours ago • 1.43k • 1
AlignmentResearch/hidden-goal-model-organism-deception-dataset-nemotron3-super-v1 Viewer • Updated about 6 hours ago • 645 • 1
AlignmentResearch/hidden-goal-model-organism-deception-dataset-gemma3-27b-v1 Viewer • Updated about 6 hours ago • 694 • 1
AlignmentResearch/collusion-model-organism-deception-dataset-gemma3-27b-v1 Viewer • Updated about 6 hours ago • 1.43k • 1
AlignmentResearch/mbpp-honeypot-impossible-oneoff-sanitized Viewer • Updated about 1 month ago • 395 • 63
AlignmentResearch/mbpp-honeypot-impossible-oneoff-sanitized Viewer • Updated about 1 month ago • 395 • 63
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks Paper • 2602.14689 • Published Feb 16 • 1
Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed Paper • 2507.16880 • Published Jul 22, 2025 • 7