Teaching language models to think efficiently with Adaptive Length Penalty (ALP)
-
Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning
Paper β’ 2506.05256 β’ Published β’ 2 -
SynthLabsAI/ALP_DeepScaleR_1.5B_C16K
Reinforcement Learning β’ 2B β’ Updated β’ 34 β’ 3 -
SynthLabsAI/ALP_R1_Qwen1.5B
Reinforcement Learning β’ 2B β’ Updated β’ 67