arxiv:2307.13192
Chirag Agarwal
AikyamLab
ยท
AI & ML interests
Explainability and Interpretability; AI Safety; AI Alignment
Recent Activity
upvoted a paper 2 days ago
The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages submitted a paper 2 days ago
The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages upvoted a paper about 1 month ago
Towards Understanding the Robustness of Sparse Autoencoders