Nathan Habib's picture

Building on HF

Nathan Habib PRO

SaylorTwift

huggingface

·

AI & ML interests

Evals

Recent Activity

upvoted a paper 1 day ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

new activity 1 day ago

zai-org/GLM-5:Add Terminal-Bench 2.0 evaluation result (52.4%)

new activity 2 days ago

nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16:Update tokenizer_config.json

View all activity

Organizations

upvoted a paper 1 day ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 2 days ago • 47

upvoted a collection 2 days ago

NVIDIA Nemotron v3

Open, Production-ready Enterprise Models • 12 items • Updated 3 days ago • 195

upvoted an article 3 days ago

Article

Introducing Storage Buckets on the Hugging Face Hub

+10

5 days ago

•

163

upvoted an article 5 days ago

Article

FINAL Bench: The Real Bottleneck to AGI Is Self-Correction

21 days ago

•

20

upvoted a changelog 5 days ago

Hugging Face Changelog

Public Storage Add-ons

16 days ago

• 153

upvoted a paper 10 days ago

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Paper • 2602.23866 • Published 15 days ago • 84

upvoted a paper 12 days ago

OmniGAIA: Towards Native Omni-Modal AI Agents

Paper • 2602.22897 • Published 16 days ago • 52

upvoted a collection 17 days ago

Qwen3.5

21 items • Updated 5 days ago • 1.17k

upvoted an article 22 days ago

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

+4

23 days ago

•

484

upvoted 2 papers about 1 month ago

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 256

TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents

Paper • 2602.07274 • Published Feb 6 • 207

upvoted 2 articles about 1 month ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

Feb 4

•

88

Article

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

+2

Nov 21, 2025

•

26

upvoted 2 papers about 2 months ago

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Paper • 2601.18137 • Published Jan 26 • 35

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3, 2025 • 21

upvoted an article about 2 months ago

Article

AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality

Jan 21

•

31

upvoted a collection 2 months ago

deployed-models

Models that are currently deployed by the hf-inference provider • 1511 items • Updated about 1 hour ago • 35

upvoted 3 articles 3 months ago

Article

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

+4

Dec 18, 2025

•

122

Article

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

Dec 17, 2025

•

47

Article

Phare LLM benchmark V2: Reasoning models don't guarantee better security

Dec 16, 2025

•

10