ankitd-nx (Ankit Dhiman)

liked a dataset 8 days ago

ankitdhiman/haryanvi-tts

Viewer • Updated Aug 12, 2025 • 5.52k • 156 • 3

upvoted a changelog 8 days ago

Hugging Face Changelog

Introducing Buckets: S3-like storage on the Hub

10 days ago

• 176

liked a dataset 4 months ago

ankitdhiman/colloquial-hinglish-conversations

Viewer • Updated Nov 19, 2025 • 2.69k • 15 • 1

reactedto sergiopaniego's post with 🤗 4 months ago

Post

1772

Who wants a TRL sticker? 🙋

https://github.com/huggingface/trl

1 reply

·

liked a model 4 months ago

maya-research/maya1

Text-to-Speech • Updated Nov 12, 2025 • 57.7k • 871

liked a model 5 months ago

nvidia/parakeet_realtime_eou_120m-v1

Updated Dec 3, 2025 • 804 • 127

reactedto Kseniase's post with 🔥 5 months ago

Post

11216

11 Fascinating new Policy Optimization techniques

Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:

1. BAlanced Policy Optimization (BAPO) → BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping (2510.18927)
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse

2. Training-Free GRPO → Training-Free Group Relative Policy Optimization (2510.08191)
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior

3. Asymmetric Importance Sampling Policy Optimization (ASPO) → ASPO: Asymmetric Importance Sampling Policy Optimization (2510.06062)
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable

4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence

5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively

6. Information Gain-based Policy Optimization (IGPO) → Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents (2510.14967)
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

2 replies

·

liked a model 5 months ago

Qwen/Qwen3-14B

Text Generation • Updated Jul 26, 2025 • 2.81M • • 380

reactedto cgeorgiaw's post with ❤️ 7 months ago

Post

6067

🚀🚀🚀 The largest ever dataset of co-folded 3D protein-ligand structures just dropped on HF!!

Meet SAIR (Structurally Augmented IC₅₀ Repository): 5M+ AI-generated complexes with experimentally measured drug potency data from SandboxAQ. 🚀🚀🚀

Check it out and explore here: SandboxAQ/SAIR

3 replies

·

liked a dataset 7 months ago

nvidia/CantTalkAboutThis-Topic-Control-Dataset

Viewer • Updated Jan 16, 2025 • 1.09k • 91 • 10

liked a model 7 months ago

ankitdhiman/nemotron-hinglish-4b-thinking-tool-use

Updated Aug 26, 2025 • 2

New activity in ankitdhiman/nemotron-hinglish-4b-thinking-tool-use 7 months ago

Delete training_args.bin

#1 opened 7 months ago by

ankitd-nx

reactedto MonsterMMORPG's post with 🚀 7 months ago

Post

5811

Qwen Image Edit Full Tutorial: 26 Different Demo Cases, Prompts & Images, Pwns FLUX Kontext Dev

Tutorial Link
https://youtu.be/gLCMhbsICEQ

Extra Info
I tested newly arrived Qwen-Image-Edit-Lightning-8steps-V1.0 (arrived after tutorial recorded) and definitely our existing preset which uses Qwen-Image-Lightning-8steps-V1.1 is better than it, so this tutorial and presets are still 100% best quality and up-to-date

Info
Qwen Image Edit just has been published and since then I have been experimenting to prepare you this amazing tutorial. I have literally shown 26 unique cases and provided demo images and prompts. After watching this tutorial your image editing skills will move to next level i promise you that. Also this tutorial will give you a lot of ideas.

reactedto FlameF0X's post with 😔 7 months ago

Post

4177

I am very sad to say that the budget in creating of SnowflakeCore-G1 1b and 7b MoE models ran out and I can't pre-train them anymore.

7 replies

·

reactedto neph1's post with 🔥 7 months ago

Post

3656

I'm building a mmo-ish RPG with LLM agents that can (hopefully) complete player tasks, as an experiment. I've started documenting my progress here: https://huggingface.co/blog/neph1/rpg-llm-agents

Let me know if you want to see more of it.

4 replies

·

liked a model 8 months ago

snorbyte/snorTTS-Indic-v0

Text-to-Speech • Updated Aug 2, 2025 • 561 • 21

reactedto mitkox's post with 🚀 8 months ago

Post

2125

I run Qwen3-Coder 480B locally on my Z8, with a 1-million token context window. It’s the equivalent of parallel-parking a Nimitz-class carrier in a kiddie pool. Thanks to whatever dark pact the llama.cpp, CUDA, and kernel folks signed, hybrid inferencing + VRAM↔RAM offload let me stream the model’s synapses across Xeon, RAM, and four lonely A6000s without summoning either the OOM killer or a small house fire.

Ankit Dhiman

AI & ML interests

Recent Activity

Organizations

ankitdhiman/haryanvi-tts

Introducing Buckets: S3-like storage on the Hub

ankitdhiman/colloquial-hinglish-conversations

maya-research/maya1

nvidia/parakeet_realtime_eou_120m-v1

Qwen/Qwen3-14B

nvidia/CantTalkAboutThis-Topic-Control-Dataset

ankitdhiman/nemotron-hinglish-4b-thinking-tool-use

Delete training_args.bin

snorbyte/snorTTS-Indic-v0

Ankit Dhiman

AI & ML interests

Recent Activity

Organizations

ankitd-nx's activity

Introducing Buckets: S3-like storage on the Hub

Delete training_args.bin