whisper.cpp / ggml /src /ggml-cuda /fattn-common.cuh

Commit History

CUDA: fix negative KV_max values in FA (llama/15321)
6e3a7b6

JohannesGaessler commited on

llama : add gpt-oss (llama/15091)
bf225d6

ggerganov ngxson HF Staff slaren commited on

CUDA: skip masked KV slices for all FA kernels (llama/14924)
0c60f80

JohannesGaessler commited on

HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945)
e37eff3

uvos commited on

CUDA: fix overflow in FA, tune performance (llama/14840)
10ac92f

JohannesGaessler commited on

CUDA: broadcasting for FlashAttention mask (llama/14500)
47e02a8

JohannesGaessler commited on

CUDA: fix FA tg at long context for CC >= 8.9 (llama/13852)
d9bd7ce

JohannesGaessler commited on

CUDA: faster Deepseek FA, add Turing support (llama/13435)
ace16dc

JohannesGaessler commited on

CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)
507d30c

JohannesGaessler commited on

CUDA: fix bad asserts for partial offload (llama/13337)
23e676b

JohannesGaessler commited on

musa: fix compilation warnings in mp_22/31 (llama/12780)
090ad80

R0CKSTAR commited on

musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611)
12bb60d

R0CKSTAR commited on

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)
3a7ca19

Gaurav Garg JohannesGaessler commited on

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
2adc060

uvos commited on

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)
a027c1d

David Huang commited on

CUDA: optimize FA for GQA + large batches (llama/12014)
6662d54

JohannesGaessler commited on

CUDA: use async data loading for FlashAttention (llama/11894)
5b9980d

JohannesGaessler Diego Devesa commited on

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)
acfd94f

JohannesGaessler commited on

CUDA: use mma PTX instructions for FlashAttention (llama/11583)
f328957

JohannesGaessler Diego Devesa commited on

ggml : build backends as libraries (llama/10256)
3dc93f3

Diego Devesa ggerganov R0CKSTAR commited on

CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
fb8ae8b

JohannesGaessler commited on

ggml : reduce hash table reset cost (llama/8698)
9808fbf

slaren commited on

CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)
8411e3c

JohannesGaessler commited on

CUDA: refactor and optimize IQ MMVQ (llama/8215)
afa1447

JohannesGaessler commited on

whisper : reorganize source code + improve CMake (#2256)
f75c2e3
unverified

ggerganov commited on