Multi-Task GRPO: Reliable LLM Reasoning Across Tasks Paper โข 2602.05547 โข Published Feb 5 โข 12 โข 5
Meta-Okapi/ca_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated Jan 27
Meta-Okapi/ca_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated Jan 27
Meta-Okapi/ro_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated Jan 27
Meta-Okapi/ro_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated Jan 27
Meta-Okapi/fr_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated Jan 27
Meta-Okapi/fr_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated Jan 27