Submitted by Xiaoya Li 12 CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning DeepReinforce 270 2