DNA 2.1

DNA 2.1 is a fine-tuned Qwen3 14B model that thinks natively in Korean through a two-stage training approach. This model is released alongside the paper Making Qwen3 Think in Korean with Reinforcement Learning.

Key Features

  • Two-Stage Training Approach: Supervised fine-tuning (SFT) on high-quality Korean reasoning datasets followed by reinforcement learning with our proposed Oracle-Guided Dr. GRPO algorithm
  • Native Korean Thinking: Conducts internal chain-of-thought reasoning entirely in Korean
  • Stable RL Training: Addresses reward hacking and policy collapse through oracle judge model for reward signal calibration
  • Enhanced Reasoning Performance: Substantially improved results on advanced reasoning benchmarks, particularly in math and coding tasks
  • Preserved Knowledge & Language Proficiency: Maintains existing knowledge and language capabilities after reinforcement learning

Base Model

This model builds upon Smoothie Qwen3, which reduces Chinese token emission probabilities and enhances Korean reasoning capabilities.

Citation

If you use this model in your research, please cite our paper:

@misc{lee2025makingqwen3thinkkorean,
      title={Making Qwen3 Think in Korean with Reinforcement Learning}, 
      author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
      year={2025},
      eprint={2508.10355},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10355}, 
}
Downloads last month
119
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dnotitia/DNA-2.1-14B

Finetuned
Qwen/Qwen3-14B
Finetuned
(2)
this model

Collection including dnotitia/DNA-2.1-14B