ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models Paper • 2510.16928 • Published Oct 19 • 4
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains Paper • 2507.07229 • Published Jul 9 • 11
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published Apr 9 • 76
The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure Paper • 2506.22724 • Published Jun 28 • 10
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models Paper • 2506.06485 • Published Jun 6 • 5