view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 161
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 160
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn • Jan 27 • 75
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated Dec 10, 2025 • 23
view article Article The Optimal Architecture for Small Language Models codelion • Dec 26, 2025 • 121
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 58