Benchmark Datasets - a Lumia101 Collection

Lumia101 's Collections

Benchmark Datasets

Quantized Language Models

Benchmark Datasets

updated Dec 9, 2025

Benchmarks for LLM

openai/gsm8k

Benchmark • Updated Dec 20, 2025 • 17.6k • 483k • 1.16k

Note Lv 2.9
allenai/winogrande

Viewer • Updated Jul 11, 2025 • 81.4k • 191k • 74

Note Lv 3.1
madrylab/gsm8k-platinum

Viewer • Updated Mar 11, 2025 • 1.21k • 2.6k • 45

Note Lv 3.5
maveriq/bigbenchhard

Viewer • Updated Sep 29, 2023 • 6.51k • 1.23k • 38

Note Lv 4.3
openai/openai_humaneval

Viewer • Updated Jan 4, 2024 • 164 • 161k • 365

Note Lv 4.8
google/simpleqa-verified

Viewer • Updated Sep 22, 2025 • 1k • 3.76k • 32

Note Lv 4.9
google-research-datasets/mbpp

Viewer • Updated Jan 4, 2024 • 1.4k • 1.22M • 215

Note Lv 5.1
cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 306k • 664

Note Lv 6.0
allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 7.79k • 279k • 313

Note Lv 6.2
edinburgh-dawg/mmlu-redux-2.0

Viewer • Updated Feb 25, 2025 • 5.7k • 10.7k • 35

Note Lv 6.3
evalplus/humanevalplus

Viewer • Updated May 1, 2024 • 164 • 16k • 18

Note Lv 6.3
HuggingFaceH4/MATH

Viewer • Updated Jan 28, 2025 • 13.8k • 463 • 8

Note Lv 6.5
evalplus/mbppplus

Viewer • Updated Apr 17, 2024 • 378 • 11.6k • 15

Note Lv 6.8
google/IFEval

Viewer • Updated Aug 14, 2024 • 541 • 57.7k • 130

Note Lv 7.1
KbsdJames/Omni-MATH

Viewer • Updated Oct 12, 2024 • 4.43k • 2.26k • 125

Note Lv 7.5
TIGER-Lab/MMLU-Pro

Benchmark • Updated 20 days ago • 12.1k • 84.8k • 416

Note Lv 7.9
livecodebench/code_generation

Viewer • Updated Jun 13, 2024 • 121 • 3.84k • 28

Note Lv 8.3
pxferna/ARC-AGI-v1

Viewer • Updated Apr 14, 2025 • 800 • 1 • 1

Note Lv 8.6
princeton-nlp/SWE-bench_Verified

Viewer • Updated Feb 18, 2025 • 500 • 626k • 262

Note Lv 9.0
math-ai/aime24

Viewer • Updated 20 days ago • 30 • 6.51k • 13

Note Lv 9.2
math-ai/aime25

Viewer • Updated 20 days ago • 30 • 31.5k • 24

Note Lv 9.3
MathArena/hmmt_feb_2025

Viewer • Updated May 14, 2025 • 30 • 3.87k • 7

Note Lv 9.5
Idavidrein/gpqa

Benchmark • Updated 16 days ago • 1.25k • 86.6k • 352

Note Lv 9.6
cais/hle

Benchmark • Updated 18 days ago • 2.5k • 22.9k • 692

Note Lv 10.0