CoLLiE: Collaborative Training of Large Language Models in an Efficient Way Paper • 2312.00407 • Published Dec 1, 2023 • 3
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models Paper • 2406.07594 • Published Jun 11, 2024 • 1
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models Paper • 2406.14952 • Published Jun 21, 2024
A Cognitive Writing Perspective for Constrained Long-Form Text Generation Paper • 2502.12568 • Published Feb 18, 2025
From Rankings to Insights: Evaluation Should Shift Focus from Leaderboard to Feedback Paper • 2505.06698 • Published May 10, 2025 • 1
MorphMark: Flexible Adaptive Watermarking for Large Language Models Paper • 2505.11541 • Published May 14, 2025 • 1
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24, 2025 • 8
A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports Paper • 2510.02190 • Published Oct 2, 2025 • 18
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? Paper • 2506.14805 • Published Jun 3, 2025 • 3
S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models Paper • 2509.22099 • Published Sep 26, 2025
OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs Paper • 2601.01592 • Published 6 days ago • 11
Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia Paper • 2503.01714 • Published Mar 3, 2025 • 5