ValueFX9507/Tifa-Deepsex-14b-CoT-GGUF-Q4 Reinforcement Learning • 15B • Updated Feb 13, 2025 • 2.03k • 820
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 124
mistral-community/Mixtral-8x22B-Instruct-v0.1-4bit Text Generation • 143B • Updated Jul 1, 2024 • 43 • 11