--- title: SearchAgent_Leaderboard emoji: 🥇 colorFrom: green colorTo: indigo sdk: gradio app_file: app.py pinned: true license: apache-2.0 short_description: A standardized leaderboard for search agents sdk_version: 4.23.0 tags: - leaderboard --- # Overview SearchAgent Leaderboard provides a simple, standardized way to compare search-augmented QA agents across: - General QA: NQ, TriviaQA, PopQA - Multi-hop QA: HotpotQA, 2wiki, Musique, Bamboogle - Novel closed-world: FictionalHot We display a minimal set of columns for clarity: - Rank, Model, Average, per-dataset scores, Model Size (3B/7B) # Data format (results) Place model result files in `eval-results/` as JSON. Scores are decimals in [0,1] (the UI multiplies by 100). ```json { "config": { "model_dtype": "torch.float16", "model_name": "YourMethod-Qwen2.5-7b-Instruct", "model_sha": "main" }, "results": { "nq": { "exact_match": 0.469 }, "triviaqa": { "exact_match": 0.640 }, "popqa": { "exact_match": 0.501 }, "hotpotqa": { "exact_match": 0.389 }, "2wiki": { "exact_match": 0.382 }, "musique": { "exact_match": 0.185 }, "bamboogle": { "exact_match": 0.392 }, "fictionalhot": { "exact_match": 0.061 } } } ``` Notes: - `model_name` uses the format `Method-Qwen2.5-{3b|7b}-Instruct` (no org prefix required) - Tasks: `nq`, `triviaqa`, `popqa`, `hotpotqa`, `2wiki`, `musique`, `bamboogle`, `fictionalhot` - Metric key: `exact_match` # Submission (via Community) We accept submissions via the Space Community (Discussions): 1) Open the Space page and go to Community: `https://huggingface.co/spaces/TencentBAC/SearchAgent_Leaderboard` 2) Create a discussion with title `Submission: --` 3) Include: - Model weights link (HF or GitHub) - Short method description - Evaluation JSON (inline or attached) # Local development Run locally (example): ```bash python app.py ``` The app reads local data only (no remote download) from: - Results: `./eval-results` - (Optional) Requests: `./eval-queue` (not required for the simplified table) If you see missing dependencies, install minimally: ```bash pip install gradio gradio_leaderboard pandas huggingface_hub apscheduler ``` # Customize - Tasks and page texts: `src/about.py` - Displayed columns: `src/display/utils.py` (we keep Rank, Model, Average, per-dataset, Model Size) - Custom model links (name→URL mapping): `src/display/formatting.py` (`custom_links` dict) - Data loading and ranking: `src/leaderboard/read_evals.py`, `src/populate.py` Restart the app after changes.