Spaces:
Running
Running
Commit History
HIP: Cleanup hipification header (llama/15285) 7cdf9cd
cuda : fix GGML_CUDA_GRAPHS=OFF (llama/15300) 59c694d
Sigbjørn Skjæret commited on
finetune: SGD optimizer, more CLI args (llama/13873) f585fe7
HIP: bump requirement to rocm 6.1 (llama/15296) 58a3802
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132) c768824
HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273) 8fca6dd
CUDA cmake: add `-lineinfo` for easier debug (llama/15260) 008e169
musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236) 4168dda
cuda: refactored ssm_scan and use CUB (llama/13291) 7a187d1
David Zhao commited on
CUDA: add attention sinks for tile and wmma (llama/15178) 46e7c87
ggml : fix field name when new ggml_backend (llama/14944) 685748d
AN Long commited on
CUDA: attention sinks for mma FlashAttention (llama/15157) 0ab9aba
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131) 1d24833
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (llama/15035) 9e85264
cuda: make im2col a little faster (llama/15025) 9a85c65
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (llama/15038) cc3a2ed
CUDA: fix MMQ nwarps for AMD with warp_size==32 (llama/15014) fbc3cd1
HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949) 149f5a5
CUDA: skip masked KV slices for all FA kernels (llama/14924) 0c60f80
HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945) e37eff3
HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930) f9dbd96
HIP: Ignore unsupported unroll transformation in fattn-vec (llama/14931) 8e133f7
cuda : add softcap fusion (llama/14907) 2237878
Sigbjørn Skjæret commited on
CUDA: add roll (llama/14919) d41a4ec
CUDA: fix pointer incrementation in FA (llama/14916) eb84e7e
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624) 5422b31
deepsek commited on
musa: fix build warnings (unused variable) (llama/14869) f38d409
musa: upgrade musa sdk to rc4.2.0 (llama/14498) a687ec3
CUDA: fix overflow in FA, tune performance (llama/14840) 10ac92f
CUDA: fix compilation with GGML_CUDA_F16 (llama/14837) 2746afd
CUDA: fix quantized KV cache + multiple sequences (llama/14822) 88864af
CUDA: add fused rms norm (llama/14800) 79bc58c
cuda : implement bf16 cpy ops and enable bf16 cont (llama/14763) b54b644
Sigbjørn Skjæret commited on
cuda: remove linking to cublasLt (llama/14790) fafaa8b
vulkan/cuda: Fix im2col when KW!=KH (llama/14789) 0be0329
cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (llama/14741) bb523fb
Oliver Simons commited on
CUDA: set_rows + cpy.cu refactor (llama/14712) 536128f
llama : add high-throughput mode (llama/14363) b2d73a2
cuda: fix build warnings in set-rows.cu (unused variable) (llama/14687) 1e145c7
cuda : add set rows for bf16 (llama/14664) 1f97ff4
Sigbjørn Skjæret commited on
cuda : add ELU support (llama/14657) cbe8006
Yavor Ivanov commited on
ggml : add build-time message to remind about ggml_set_rows (llama/14661) 0f5d4ba
CUDA: add set rows for f32 and f16 (llama/14551) e51f2d4
model : support LiquidAI LFM2 hybrid family (llama/14620) 07ff90a
Tarek Dakhran commited on
HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634) 4354560
Slobodan Josic commited on