A newer version of the Gradio SDK is available:
6.1.0
metadata
title: SearchAgent_Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: A standardized leaderboard for search agents
sdk_version: 4.23.0
tags:
- leaderboard
Overview
SearchAgent Leaderboard provides a simple, standardized way to compare search-augmented QA agents across:
- General QA: NQ, TriviaQA, PopQA
- Multi-hop QA: HotpotQA, 2wiki, Musique, Bamboogle
- Novel closed-world: FictionalHot
We display a minimal set of columns for clarity:
- Rank, Model, Average, per-dataset scores, Model Size (3B/7B)
Data format (results)
Place model result files in eval-results/ as JSON. Scores are decimals in [0,1] (the UI multiplies by 100).
{
"config": {
"model_dtype": "torch.float16",
"model_name": "YourMethod-Qwen2.5-7b-Instruct",
"model_sha": "main"
},
"results": {
"nq": { "exact_match": 0.469 },
"triviaqa": { "exact_match": 0.640 },
"popqa": { "exact_match": 0.501 },
"hotpotqa": { "exact_match": 0.389 },
"2wiki": { "exact_match": 0.382 },
"musique": { "exact_match": 0.185 },
"bamboogle": { "exact_match": 0.392 },
"fictionalhot": { "exact_match": 0.061 }
}
}
Notes:
model_nameuses the formatMethod-Qwen2.5-{3b|7b}-Instruct(no org prefix required)- Tasks:
nq,triviaqa,popqa,hotpotqa,2wiki,musique,bamboogle,fictionalhot - Metric key:
exact_match
Submission (via Community)
We accept submissions via the Space Community (Discussions):
- Open the Space page and go to Community:
https://huggingface.co/spaces/TencentBAC/SearchAgent_Leaderboard - Create a discussion with title
Submission: <YourMethod>-<model_name>-<model_size> - Include:
- Model weights link (HF or GitHub)
- Short method description
- Evaluation JSON (inline or attached)
Local development
Run locally (example):
python app.py
The app reads local data only (no remote download) from:
- Results:
./eval-results - (Optional) Requests:
./eval-queue(not required for the simplified table)
If you see missing dependencies, install minimally:
pip install gradio gradio_leaderboard pandas huggingface_hub apscheduler
Customize
- Tasks and page texts:
src/about.py - Displayed columns:
src/display/utils.py(we keep Rank, Model, Average, per-dataset, Model Size) - Custom model links (name→URL mapping):
src/display/formatting.py(custom_linksdict) - Data loading and ranking:
src/leaderboard/read_evals.py,src/populate.py
Restart the app after changes.