am17an's picture
CUDA: add dynamic shared mem to softmax, refactor general usage (llama/14497)
8e1f56c