AI & ML interests
Data-centric AI, LLM, MLLM
Recent Activity
View all activity
Papers
View all Papers
Organization Card
🌐 About OpenDataArena
OpenDataArena (ODA) is an open research initiative devoted to evaluating, benchmarking, and creating high-value datasets for the post-training era of large language models (LLMs).
We believe data quality defines model capability — and that open, reproducible evaluation is key to accelerating progress in AI.
🚀 Our Mission
To make data evaluation scientific, transparent, and community-driven, while continuously producing high-value, openly available datasets that enhance model alignment and reasoning ability.
🔑 Key Features
- 🏆 Dataset Leaderboard — Leaderboard ranks the most valuable datasets across multiple domains, based on diverse benchmarks.
- 📊 Comprehensive Scoring System — Scoring tool measures dataset quality, diversity, and learning values using reproducible pipelines.
- 🧰 Open-Source Toolkit — OpenDataArena-Tool enables dataset evaluation, scoring with a standardized, community-driven workflow.
- 🌱 High-Value Data Generation — beyond evaluation, ODA continuously produces and shares new, top-quality datasets for fine-tuning and alignment research.
If you find our work helpful, please consider ⭐ starring and subscribing to support open, data-driven AI research. Learn more at opendataarena.github.io.
(OpenDataArena is part of OpenDataLab).
models
0
None public yet
datasets
3
Viewer
•
Updated
•
2.15M
•
1.59k
•
22
OpenDataArena/OpenDataArena-scored-data
Viewer
•
Updated
•
15.7M
•
390
•
6
OpenDataArena/MathLake
Viewer
•
Updated
•
8.31M
•
1.46k
•
13