The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving
Abstract
Large language model training methods that optimize for correctness can cause reasoning path diversity collapse, but a new variational framework provides principled solutions to maintain both accuracy and creativity.
State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops: sampling diverse chains of thought and reinforcing the highest-scoring ones, mainly optimizing correctness. We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths, slashing semantic entropy and undermining creative problem-solving. To analyze this failure, we introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces. STaR, GRPO, and DPO, as well as entropy bonuses, and other methods, all constitute special cases of the same loss. The framework delivers three core results: (i) the diversity decay theorem, describing how correctness-based objectives lead to distinct modes of diversity decay for STaR, GRPO, and DPO; (ii) designs that ensure convergence to a stable and diverse policy, effectively preventing collapse; and (iii) simple, actionable recipes to achieve this in practice. DCR thus offers the first principled recipe for LLMs that remain both correct and creative.
Community
For those of you interested in RLVR, here is a paper that formally characterizes the mechanism behind "diversity collapse" in reasoning models trained with scalar rewards (such as STaR, GRPO, and DPO).
The paper introduces a variational framework based on Shahshahani gradient flow to prove that optimizing solely for correctness inherently erodes the diversity of reasoning paths, leading to a "reasoning monoculture." To address this, they propose Distributional Creative Reasoning (DCR), which incorporates a diversity energy functional (using entropy and kernel-based novelty) into the objective, mathematically guaranteeing the maintenance of a diverse portfolio of successful reasoning strategies while still optimizing for utility.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Multi-Path Collaborative Reasoning via Reinforcement Learning (2025)
- Consistency Is Not Always Correct: Towards Understanding the Role of Exploration in Post-Training Reasoning (2025)
- SSR: Socratic Self-Refine for Large Language Model Reasoning (2025)
- STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models (2025)
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B (2025)
- Efficient Thought Space Exploration through Strategic Intervention (2025)
- EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/the-reasoning-creativity-trade-off-toward-creativity-driven-problem-solving-5605-80c56a19
- Executive Summary
- Detailed Breakdown
- Practical Applications
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper