OpenCompass

community

https://opencompass.org.cn/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yuhangzang authored a paper about 5 hours ago

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

GMFTBY authored a paper 20 days ago

MTAVG-Bench: A Comprehensive Benchmark for Evaluating Multi-Talker Dialogue-Centric Audio-Video Generation

mzr1996 authored a paper about 1 month ago

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

View all activity

Papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

View all Papers

authored a paper about 5 hours ago

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

Paper • 2605.10912 • Published 8 days ago • 44

authored a paper 20 days ago

MTAVG-Bench: A Comprehensive Benchmark for Evaluating Multi-Talker Dialogue-Centric Audio-Video Generation

Paper • 2602.00607 • Published Jan 31

authored 5 papers about 1 month ago

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

Paper • 2407.10499 • Published Jul 15, 2024

MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space

Paper • 2504.13835 • Published Apr 18, 2025 • 38

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Paper • 2602.11089 • Published Feb 11 • 18

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 132

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

Paper • 2604.14116 • Published Apr 15 • 13

authored a paper about 2 months ago

Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

Paper • 2603.26664 • Published Mar 27 • 9

authored a paper about 2 months ago

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

Paper • 2603.28376 • Published Mar 30 • 24

submitted a paper to Daily Papers about 2 months ago

Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

Paper • 2603.26664 • Published Mar 27 • 9

authored a paper about 2 months ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 132

authored a paper about 2 months ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 132

authored 2 papers 2 months ago

From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space

Paper • 2603.12648 • Published Mar 13 • 14

Visual-ERM: Reward Modeling for Visual Equivalence

Paper • 2603.13224 • Published Mar 13 • 21

updated a dataset 2 months ago

opencompass/TextEdit

Viewer • Updated Mar 15 • 2.15k • 655 • 9

authored 4 papers 2 months ago

TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

Paper • 2512.01248 • Published Dec 1, 2025 • 12

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models

Paper • 2603.12252 • Published Mar 12 • 12

Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

Paper • 2603.12247 • Published Mar 12 • 24

submitted a paper to Daily Papers 2 months ago

EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models

Paper • 2603.12252 • Published Mar 12 • 12