Implement test-time compute scaling for math problems
Display and analyze reward model evaluation results