Oqwenheimer 0.5B:

Oqwenheimer is the first reasoning model from DaertML; the recipe takes on RLVR training of Qwen2.5-0.5B-Instruct, and enhances its reasoning capabilities by letting the model learn the best way to solve the given problems.

oqwenheimer

By training on physics (problem,result) pairs, the model is capable of developing reasoning capabilities on a wide range of problems, for which it has been tested, some of the emerging capabilities include:

  • Poker decision making
  • Chess problem solving
  • Chemistry reasoning
  • Math problems resolution
  • Code reasoning

The main reason this model is released in such a rushed way, is due to its emerging capabilities, by training on a niche dataset that contains physics problems, the model unlocks the capability of reasoning on a set of different fields.

The model's CoT is efficient, saving on a lot of tokens that the Qwen3 family of models suffers from, if used without prompting, controlin the CoT length (out of the box).

The model has been trained on a single 3090 RTX, over 2 sets of 40 runs of the dataset. We chose as the checkpoint to release the one with the better results in the testing data. Each epoch is 256 trajectories over the training batch of 256 instances of 32 questions each to solve.

Even on such small set of generations, the model is capable of reproducing the "Aha moment!" in several test cases.

The result of this model is obtained by letting the model solve physics problems (thus the name of the model), using 1K physics problems. This little amount of pairs of (problem, answer), for which the answer only contains the solution of the problem, not the Chain-Of-Thought or the whole LLM model response, opens the door to the capabilities of training these models on a low amount of data and compute.

Run the model

python3 batch_inference.py --config config.yaml
--checkpoint checkpoints/ckpt_000025.pt
--batch
--questions questions.txt
--output results.json

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support