Chirag Agarwal's picture

Chirag Agarwal

AikyamLab

·

https://chirag-agarwall.github.io/

AI & ML interests

Explainability and Interpretability; AI Safety; AI Alignment

Recent Activity

upvoted a paper 2 days ago

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

submitted a paper 2 days ago

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

upvoted a paper about 1 month ago

Towards Understanding the Robustness of Sparse Autoencoders

View all activity

Organizations

Papers 2

arxiv:2307.13192

arxiv:2003.08754

models 0

None public yet

datasets 0

None public yet