DNA 2.1

DNA 2.1 is a fine-tuned Qwen3 14B model that thinks natively in Korean through a two-stage training approach. This model is released alongside the paper Making Qwen3 Think in Korean with Reinforcement Learning.

Key Features

Two-Stage Training Approach: Supervised fine-tuning (SFT) on high-quality Korean reasoning datasets followed by reinforcement learning with our proposed Oracle-Guided Dr. GRPO algorithm
Native Korean Thinking: Conducts internal chain-of-thought reasoning entirely in Korean
Stable RL Training: Addresses reward hacking and policy collapse through oracle judge model for reward signal calibration
Enhanced Reasoning Performance: Substantially improved results on advanced reasoning benchmarks, particularly in math and coding tasks
Preserved Knowledge & Language Proficiency: Maintains existing knowledge and language capabilities after reinforcement learning

Base Model

This model builds upon Smoothie Qwen3, which reduces Chinese token emission probabilities and enhances Korean reasoning capabilities.

Citation

If you use this model in your research, please cite our paper:

@misc{lee2025makingqwen3thinkkorean,
      title={Making Qwen3 Think in Korean with Reinforcement Learning}, 
      author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
      year={2025},
      eprint={2508.10355},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10355}, 
}