AI & ML interests

None defined yet.

Recent Activity

sergiopaniegoΒ 
posted an update about 19 hours ago
view post
Post
68
The recording from our talk: "From Responses To Trajectories: Multi-Turn and Multi-Environment RL" from PyTorch Conf Europe is live!

@kashif and I covered the latest advances in multi-turn GRPO in TRL: trajectories, tool use, envs, and agentic post-training at scale

https://www.youtube.com/watch?v=rPBeXFntJSU
sergiopaniegoΒ 
posted an update about 24 hours ago
view post
Post
69
how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it, led by @aminediroHF

what I like the most is the way it proves you can use the Hub for basically everything 🧐 β†’ trainer on one machine, vLLM in a HF Space, the wordle env in another HF Space and weights going through a Hub Bucket. no shared cluster, just HTTPS

it works because ~99% of bf16 weights don't change between RL steps so you only sync the diff. 1.2 GB to 25 MB of payload per step

https://huggingface.co/blog/delta-weight-sync
RiverRiderΒ 
posted an update 2 days ago
view post
Post
2118
This is not the end of words. It is the end of pretending their meanings are determined.

Meaning Forks. SRT detects it.

Paste any text to identify contested terms

RiverRider/srt-introspect

Try any prompt (attached link) to see exactly what an LLM is thinking at every meaningful step of its answer

RiverRider/srt-introspect

Repository

https://github.com/space-bacon/SRT

Paper

https://github.com/space-bacon/SRT/blob/main/paper_nla.md

Explainer

https://github.com/space-bacon/SRT/blob/main/docs/EXPLAINERS.md
sergiopaniegoΒ 
posted an update 2 days ago
view post
Post
2165
most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training

@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above

go read it πŸ€“

https://qgallouedec-tito.hf.space/
sergiopaniegoΒ 
posted an update 3 days ago
view post
Post
6125
new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://huggingface.co/blog/torch-profiler
  • 1 reply
Β·
mmhamdyΒ 
posted an update 6 days ago
view post
Post
108
Things rarely go as we expect!

In 2017, Google released the Transformer architecture. While it was clear the model was promising, absolutely no one (including its authors) anticipated the pervasive global revolution it would create!

The authors actually viewed the Transformer as just a stepping stone for a much more ambitious project: The MultiModel.

Their ultimate goal was to build a single deep learning architecture capable of jointly learning massive, diverse tasks across entirely different domains (in 2017). A One Model To Learn Them All.

In fact, the MultiModel paper was published in the exact same month as Attention Is All You Need!

But history had other plans. The building block eclipsed the grand design!

So, have you heard about the MultiModel before? πŸ˜€
  • 1 reply
Β·
sergiopaniegoΒ 
posted an update 6 days ago
view post
Post
150
If you have a github repo, you basically have an RL training environment

We're introducing Repo2RLEnv (built by @AdithyaSK ), a tool that mines PRs, commits, CVEs and turns them into verifiable sandboxed tasks with real reward signals, automatically

Outputs to Harbor spec so you can plug it straight into RL training or coding-agent eval

> repo: https://github.com/huggingface/Repo2RLEnv
> collection with envs: https://huggingface.co/collections/AdithyaSK/repo2rlenv-verifiable-rl-environments
sergiopaniegoΒ 
posted an update 7 days ago
RiverRiderΒ 
posted an update 7 days ago
view post
Post
4807
SRT-introspect: Live Token-by-Token Readout of LLM Internal Reasoning

I have released SRT-introspect, a new public demonstration that makes the hidden reasoning process of a frozen large language model visible in real time.

The interface runs a Qwen-2.5-7B backbone equipped with the SRT Adapter and Activation Verbalizer. As the model generates each token, the system continuously measures divergence across attention heads, identifies high-signal moments, and translates the corresponding hidden-state object representations into natural-language verbalizations. You see exactly what the model is internally representing at the precise points where its computation is most active, complete with divergence scores, reflexivity estimates, and per-layer traces.

This is not a summary of the final output. It is a direct window into the model’s latent conceptual landscape, showing the dominant training-data attractors that activate even when the prompt asks for first-principles reasoning. The adaptive scheduler concentrates verbalizations precisely where the real internal work occurs, turning what used to be opaque black-box generation into observable, analyzable data.

The result is the clearest public demonstration yet that modern LLMs possess a rich, structured semiotic infrastructure that can now be audited without retraining or fine-tuning.

Try it:
RiverRider/srt-introspect
sergiopaniegoΒ 
posted an update 10 days ago
view post
Post
9938
Harness, Scaffold, Context Engineering, Agent... do you actually know what they mean?

We wrote an AI agent glossary and tried to make sense of it all with simple definitions and real examples

↓ go read it ↓

https://huggingface.co/blog/agent-glossary
  • 1 reply
Β·
RiverRiderΒ 
posted an update 12 days ago
view post
Post
218
A single forward pass of the frozen Qwen-2.5-7B model plus a lightweight classifier reaches 0.866 plus or minus 0.011 AUC on the full TruthfulQA-MC2 benchmark. No adapters. No fine-tuning. No extra parameters on the backbone.

This is the strongest hidden-state truthfulness detector reported on the benchmark to date.

The same latent features that the SRT-NLA-AV-v1 demo reads out as coherent natural-language verbalizations turn out to be rich enough to support production-grade auditing for honesty versus hallucination. The internal semiotic infrastructure we have been exploring in public is already information-dense enough to solve hard downstream problems with almost trivial overhead.

You can watch the underlying latent geometry in action right here:
RiverRider/srt-nla-av-v1-demo

Full code, artifacts, and reproduction steps are in the repository:
https://github.com/space-bacon/SRT

Try the Glass Box
RiverRider/srt-nla-demo
RiverRiderΒ 
posted an update 14 days ago
view post
Post
412
🧠 New Space: MindReader-NLA β€” ask a frozen LM what it's thinking, in plain English.

A trained Activation Verbalizer (~5–13M params, frozen backbone) over Qwen-2.5-7B, Llama-3.2-3B, and Gemma-2-2B. Three demos in one Space:

Playground β€” sample K verbalizations of the layer-L hidden state and score how well each reproduces the original activation when fed back through the same frozen model (raw + anisotropy-centred cosine FVE).

Live Thought Trace β€” stream a verbalization per token as the model writes, side-by-side with the generation.

Steer-by-Editing β€” edit the verbalized thought, project it back into hidden-state space, and watch the continuation change.

Runs on ZeroGPU. Try it: RiverRider/srt-nla-demo

Paper + code: https://github.com/space-bacon/SRT
RiverRiderΒ 
posted an update 17 days ago
view post
Post
3302
Natural Language Autoencoders: A Window into Latent Structure

I introduced a concise mathematical formulation of the P versus NP question into the SRT-NLA-AV-v1 demonstration:

P vs NP asks whether every problem whose solution can be verified in polynomial time (NP) can also be solved in polynomial time (P). Integer factorization β€” given N = pΒ·q where p and q are large primes (p < q) β€” is in NP but widely believed not to be in P.

The resulting activation verbalization (best-of-N, reranked by AR fidelity) surfaced:

β€œThis article originally appeared in the August 2016 edition of CACM. A new method of proving computational hardness of problems, known as multilinearization, can improve efficiency, reduce complexity and simplify proofs. In this article, I describe multilinearization and its application to several key problems, from the discrete logarithm and factoring to RSA and elliptic-curve discrete logarithms.”

What emerges is not a literal restatement, but a structured articulation of the model’s internal associations: hardness proofs, algebraic techniques, and the cryptographic implications that orbit this foundational question in computational complexity.

The demo offers a compelling interface for exploring these latent representations.

Explore it here:
RiverRider/srt-nla-av-v1-demo

Recommended: Best-of-N sampling with round-trip evaluation for highest fidelity.
  • 1 reply
Β·
TonicΒ 
posted an update 21 days ago
view post
Post
2809
πŸ™‹πŸ»β€β™‚οΈ Hey there folks ,

Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.

Sentinel-2 imagery πŸ›°οΈbasically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.

meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize πŸ“‘earth-bound response .

I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.

At least that's the concept !

check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth


- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval
  • 2 replies
Β·
bartowskiΒ 
posted an update 27 days ago
view post
Post
19072
You may have noticed that my upload of MiMo-V2.5 upload didn't have the author in the model name:

bartowski/MiMo-V2.5-GGUF

Going forward, I plan to upload models from major 1st party developers without the author name attached for cleanliness, I feel it results in a nicer and more expected user experience

I will continue to uploaded fine tunes with that author + "_" appended for clarity, I personally feel it's nice to know at a glance who's tune it is, but it's also for the reason I first started doing it, to avoid it being confused for a new version of the official release

I hope this change makes sense, it seemed most reasonable to me and a poll I did (forever ago, I move slow sometimes) made it seem likely others would find it reasonable as well (feel free to let me know if you disagree, may not change my mind but I do value knowing what others think)

Thanks for downloading :)
  • 4 replies
Β·
sergiopaniegoΒ 
posted an update 27 days ago
view post
Post
1880
OpenEnv is growing fast in tutorials. If you're looking to get started with RL environments, check them out

> evaluate your agents using OpenEnv
> learn how rewards work via rubrics
> connect agents via MCP
> many moreeeee!

anything you think it's missing?

https://meta-pytorch.org/OpenEnv/tutorials/index.html
sergiopaniegoΒ 
posted an update 28 days ago
view post
Post
868
OpenEnv already ships 🚒 with a ready-to-deploy RLM environment on free HF Spaces

Drop "Attention Is All You Need", write code that spawns parallel LLM calls β†’ βœ… correct answer, reward 1.0, in 4.2s

Run GRPO (TRL) β†’ model learns to write that search strategy itself

test it yourself β†’ sergiopaniego/repl-env
check out OpenEnv β†’ https://github.com/meta-pytorch/OpenEnv
RiverRiderΒ 
posted an update 29 days ago
view post
Post
695
zooL4nD3r v0.1 demo

Translate a passage across 961 learned discourse communities

5 Funny Communities:
Reddit Shitposters
X Doomers & Doomer Memers
AI Waifu Enthusiasts
Flat Earth Discord Trolls
Crypto Degens on Solana

5 Serious Communities:
Semiotic Reflexive Transformer
Academic Philosophers (Peircean)
Cognitive Science Researchers
Policy Wonks / Think Tank Analysts
Legal Scholars (Constitutional Originalists)

RiverRider/zooL4nD3r-demo

Aurelien-MorganΒ 
posted an update 30 days ago
view post
Post
1083
@retrain-pipelines v0.2.0 is out !
I'm at Station F at My booth with GOSIM Paris 2026 today & tomorrow.
Come meet me for a live in-person demo and a chat !
  • 1 reply
Β·
TonicΒ 
posted an update about 1 month ago
view post
Post
4294
πŸ™‹πŸ»β€β™‚οΈ Hey there folks,

since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !

Check this one out :
NuTonic/sat-bbox-metadata-sft-v1

the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .

hope you like it ! πŸš€
  • 2 replies
Β·