Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs Paper • 2509.01790 • Published Sep 1 • 4 • 1