Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 2 days ago • 47
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 12 items • Updated 3 days ago • 195
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Paper • 2602.23866 • Published 15 days ago • 84
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 23 days ago • 484
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents Paper • 2602.07274 • Published Feb 6 • 207
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 88
view article Article Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks +2 Nov 21, 2025 • 26
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published Jan 26 • 35
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3, 2025 • 21
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality Jan 21 • 31
deployed-models Collection Models that are currently deployed by the hf-inference provider • 1511 items • Updated about 1 hour ago • 35
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 122
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator Dec 17, 2025 • 47
view article Article Phare LLM benchmark V2: Reasoning models don't guarantee better security Dec 16, 2025 • 10