Aurélien-Morgan CLAUDON
Aurelien-Morgan
AI & ML interests
None yet
Recent Activity
View all activity
Organizations
replied to their post about 11 hours ago
posted an update 1 day ago
Post
113
@retrain-pipelines v0.2.0 is out !
I'm at Station F at My booth with GOSIM Paris 2026 today & tomorrow.
Come meet me for a live in-person demo and a chat !
I'm at Station F at My booth with GOSIM Paris 2026 today & tomorrow.
Come meet me for a live in-person demo and a chat !
replied to their post 20 days ago
And workweek of @retrain-pipelines wheels ends on a high note, indeed.
Day #5 : Integrate
replied to their post 21 days ago
Penultimate of the @retrain-pipelines workweek wheels.
Day #4 : Browse
replied to their post 22 days ago
Workweek of @retrain-pipelines wheels continues.
Day #3 : Embed
replied to their post 23 days ago
Workweek of @retrain-pipelines wheels.
Day #2 : Observe
reacted to sergiopaniego's post with ❤️ 24 days ago
Post
433
Great experience yesterday at PyTorch Conf Europe in Paris 🇫🇷
We (w/ @kashif ) talked about training LLMs through interaction, using trajectories across games, browsers, or simulators
Room was packed, a clear sign of interest in where RL post-training is heading.
sharing the slides! 🤓
https://drive.google.com/file/d/16k7YRnf5EJEo0XjXGlRJ_hVeLoFWKyNP/view?usp=sharing
We (w/ @kashif ) talked about training LLMs through interaction, using trajectories across games, browsers, or simulators
Room was packed, a clear sign of interest in where RL post-training is heading.
sharing the slides! 🤓
https://drive.google.com/file/d/16k7YRnf5EJEo0XjXGlRJ_hVeLoFWKyNP/view?usp=sharing
reacted to sergiopaniego's post with 🔥 3 months ago
Post
1549
Tiny Aya 🌿 just dropped from @CohereLabs , a really powerful multilingual small model!
To celebrate, we cooked up fresh resources to train it for tool calling 🔧
> Free Google Colab guide: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb
> Standalone training script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_tiny_aya_tool_calling.py
To celebrate, we cooked up fresh resources to train it for tool calling 🔧
> Free Google Colab guide: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb
> Standalone training script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_tiny_aya_tool_calling.py
reacted to danielhanchen's post with ❤️ 4 months ago
Post
5587
NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! 🔥
Has 1M context window & best in class performance for SWE-Bench, reasoning & chat. Run the MoE model locally with 24GB RAM.
GGUF: unsloth/Nemotron-3-Nano-30B-A3B-GGUF
💚 Step-by-step Guide: https://docs.unsloth.ai/models/nemotron-3
Has 1M context window & best in class performance for SWE-Bench, reasoning & chat. Run the MoE model locally with 24GB RAM.
GGUF: unsloth/Nemotron-3-Nano-30B-A3B-GGUF
💚 Step-by-step Guide: https://docs.unsloth.ai/models/nemotron-3
replied to their post 5 months ago
Thanks Victor !
And, that's actually a QR-code to an article I published here but, yeah, QR-code for profile would be useful. QR-code for model / dataset / Space / Paper, when ? 😀
posted an update 5 months ago
Post
367
Hey, I went to Hangzhou to talk about
The recording just got released. Go check it out !
https://www.youtube.com/watch?v=nmrMachM5aM
Slides are there :
https://docs.google.com/presentation/d/1hnAzHJ0SbeAOtGJir-iH84RBtXT1OxVT/
retrain-pipelines at the GOSIM Foundation's conference last september.The recording just got released. Go check it out !
https://www.youtube.com/watch?v=nmrMachM5aM
Slides are there :
https://docs.google.com/presentation/d/1hnAzHJ0SbeAOtGJir-iH84RBtXT1OxVT/
replied to sergiopaniego's post 6 months ago
reacted to sergiopaniego's post with 🔥 6 months ago
Post
5424
fine-tuning a 14B model with TRL + SFT on a free Colab (T4 GPU)?
thanks to the latest TRL optimizations, you actually can!
sharing a new notebook showing how to do it 😎
colab: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb
notebooks in TRL: https://github.com/huggingface/trl/tree/main/examples/notebooks
thanks to the latest TRL optimizations, you actually can!
sharing a new notebook showing how to do it 😎
colab: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb
notebooks in TRL: https://github.com/huggingface/trl/tree/main/examples/notebooks
reacted to prithivMLmods's post with 👍 7 months ago
Post
5247
Dropping some experimental adapters for FLUX.1-Kontext-dev, including Photo-Restore-i2i, PhotoCleanser-i2i, Polaroid-Warm-i2i, Yarn-Photo-i2i, and Monochrome-Pencil. These were trained under various settings with minimal image pairs to achieve optimal results. The dataset result sets end pairs were synthesized using Gemini-2.5-Flash-Image-Preview and others.🤗✨
prithivMLmods/PhotoCleanser-i2i: Remove objects while preserving the rest of the image.
prithivMLmods/Photo-Restore-i2i: Restore old photos into moderately colorized, detailed images.
prithivMLmods/Polaroid-Warm-i2i: Seamless vintage Polaroid-style images with warm, faded tones.
prithivMLmods/Yarn-Photo-i2i: Convert images into yarn-stitched artwork while retaining key details.
prithivMLmods/Monochrome-Pencil: Turn images into monochrome pencil sketches while keeping original features.
✨Note: All the above models share the same auto-labeling multimodal VLM captioning model, prithivMLmods/DeepCaption-VLA-7B, which is used for refining edit instructions and accurately understanding attributions for the generations.
✨Collection: prithivMLmods/i2i-kontext-exp-68ce573b5c0623476b636ec7
.
.
.
To know more about it, visit the app page or the respective model page!!
prithivMLmods/PhotoCleanser-i2i: Remove objects while preserving the rest of the image.
prithivMLmods/Photo-Restore-i2i: Restore old photos into moderately colorized, detailed images.
prithivMLmods/Polaroid-Warm-i2i: Seamless vintage Polaroid-style images with warm, faded tones.
prithivMLmods/Yarn-Photo-i2i: Convert images into yarn-stitched artwork while retaining key details.
prithivMLmods/Monochrome-Pencil: Turn images into monochrome pencil sketches while keeping original features.
✨Note: All the above models share the same auto-labeling multimodal VLM captioning model, prithivMLmods/DeepCaption-VLA-7B, which is used for refining edit instructions and accurately understanding attributions for the generations.
✨Collection: prithivMLmods/i2i-kontext-exp-68ce573b5c0623476b636ec7
.
.
.
To know more about it, visit the app page or the respective model page!!
reacted to meg's post with ❤️ 9 months ago
Post
514
🤖 👾 Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <—> China relationship in AI, and what it means for AI ethics.
reacted to eliebak's post with 🔥 10 months ago
Post
4835
Kimi K2 tech report is full of gems as always. Here are my notes on it:
> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with aggressive threshold). Also a cool explanation of why muon makes the logit explode in appendix E (tl;dr is that muon makes the singular value of the update matrix higher)
> Sparsity scaling laws to justify their ratio, they have a very solid training infra that allows the model to be trained at this sparsity level, they could have increased even more but as sparsity increases the training becomes less efficient.
> They diminish the number of attention heads to make it more efficient for long context since attention heads are a big bottleneck for long context. They also remove 2 of the 3 "first dense" layers in the dsv3 arch.
With the sparsity and attention heads (divided by 2) they achieve 83% increased flops compared to deepseek v3 arch at 128k.
> Data: Rephrasing is KEY. They do a lot more synthetic data generation and rephrase their corpus to have different styles, for longer documents they do it by chunk. I'm (half) surprised by the fact that ONLY 1 epoch (assuming same number of training tokens I think?) of data rephrased 10 times has better accuracy than 10 epochs of the same data rephrased once.
> They do rewriting for Math and Knowledge, for Math they apply the ShallowMath recipe and instruct the model to rephrase in a "learning note" style
> They talk about diversity and probably have some internal stuff/eval to test that, as always still a bit unclear for me how to properly measure that.
The infra is also very nice, quick summary:
> PP=16 (1F1B schedule, a bit custom), EP=16, zero1
> No FP8 computation but for storage of specific layers, selective recomputation for inexpensive block, activation offloading to CPU
> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with aggressive threshold). Also a cool explanation of why muon makes the logit explode in appendix E (tl;dr is that muon makes the singular value of the update matrix higher)
> Sparsity scaling laws to justify their ratio, they have a very solid training infra that allows the model to be trained at this sparsity level, they could have increased even more but as sparsity increases the training becomes less efficient.
> They diminish the number of attention heads to make it more efficient for long context since attention heads are a big bottleneck for long context. They also remove 2 of the 3 "first dense" layers in the dsv3 arch.
With the sparsity and attention heads (divided by 2) they achieve 83% increased flops compared to deepseek v3 arch at 128k.
> Data: Rephrasing is KEY. They do a lot more synthetic data generation and rephrase their corpus to have different styles, for longer documents they do it by chunk. I'm (half) surprised by the fact that ONLY 1 epoch (assuming same number of training tokens I think?) of data rephrased 10 times has better accuracy than 10 epochs of the same data rephrased once.
> They do rewriting for Math and Knowledge, for Math they apply the ShallowMath recipe and instruct the model to rephrase in a "learning note" style
> They talk about diversity and probably have some internal stuff/eval to test that, as always still a bit unclear for me how to properly measure that.
The infra is also very nice, quick summary:
> PP=16 (1F1B schedule, a bit custom), EP=16, zero1
> No FP8 computation but for storage of specific layers, selective recomputation for inexpensive block, activation offloading to CPU
reacted to danieldk's post with 🤗 10 months ago
Post
2000
We have been working on a project called
We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with
- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.
Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with
torch.compile support.- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.
Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
reacted to danielhanchen's post with 🔥🚀 10 months ago
Post
3240
Gemma 3n finetuning is now 1.5x faster and uses 50% less VRAM in Unsloth!
Click "Use this model" and click "Google Colab"!
unsloth/gemma-3n-E4B-it
unsloth/gemma-3n-E2B-it
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb
Click "Use this model" and click "Google Colab"!
unsloth/gemma-3n-E4B-it
unsloth/gemma-3n-E2B-it
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb
