Spaces:
Running
Running
Commit History
metal: SSM_SCAN performance (llama/14743)
5359e09
metal : fix fusion across different encoders (llama/14849)
17d67da
metal : fuse add, mul + add tests (llama/14596)
66ae493
metal : Add missing unary ops Metal support (llama/14660)
2ed022e
Yavor Ivanov
commited on
ggml : add ggml_scale_bias (llama/14417)
573d50a
metal : disable fast math in all quantize kernels (llama/14528)
df9d510
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (llama/14445)
f798922
Sigbjørn Skjæret
commited on
ggml : fix FA mask dim 2 and 3 (llama/14505)
a89dc81
llama : initial Mamba-2 support (llama/9126)
1b4087e
ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (llama/14435)
ebacb3e
ci : disable fast-math for Metal GHA CI (llama/14478)
ec4b1b3
metal : disable fast-math for some cpy kernels (llama/14460)
9d1185a
ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158)
add5c0f
metal : add special-case mat-vec mul for ne00 == 4 (llama/14385)
724622d
metal : batch rows copy in a single threadgroup (llama/14384)
b4ff704
metal : fix thread-safety (llama/14300)
2bd85b6
metal : add mean kernel (llama/14267)
a726ecc
metal : use less stack memory in FA kernel (llama/14088)
014afb6
metal : use F32 accumulators in FA kernels (llama/13975)
b86860f
ggml : add ggml_gelu_erf() (llama/13667)
6c9cd9a
metal : fix typo in FA kernel comments (llama/13651)
4c32ada
metal : add FA-vec kernel for head size 64 (llama/13583)
36a3b4e
metal : use FA-vec kernel up to batch size 20 (llama/13496)
e925f17
metal : optimize multi-sequence FA vec kernel (llama/13493)
d2f915d
ggml : add mrope kernel for metal (llama/13457)
27b32e6
metal : optimize MoE for large batches (llama/13388)
d51c0d3
metal : fix floating-point range of attention scores in FA kernels (llama/13090)
e093044
metal: add neg operator (llama/13029)
42283e1
graph : make FA compatible with MLA + add initial Metal kernels (llama/12953)
fb0d243
metal : add FA-vec kernels for head size 96 (llama/12952)
f1f88b8
llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)
e7cb2dc
ggml : add bilinear upscale support (ggml/1185)
4c5e449
Diego Devesa
commited on
metal : use F32 prec in FA kernels (llama/12688)
a49f5c2
metal : use constexpr in FA kernels + fix typedef (llama/12659)
c699617
metal : improve FA + improve MoE (llama/12612)
04a3389
metal : refactor mat-vec code (llama/12569)
71d72f9
llama: Add support for RWKV v7 architecture (llama/12412)
727de7e
metal : Cache the Metal library at the device context level (llama/12265)
e3908a2
BB-fat
commited on
ggml : skip intermediate .air file when compiling .metallib (llama/12247)
32b6ec3
metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194)
092277a
BB-fat
alexju
commited on
metal : fix default.metallib build (llama/12224)
838efb6
ggml : fix GGMLMetalClass ODR (llama/12200)
2094cb7
cuda/cpu: Increase support for fp16 unary operations (ggml/1125)
67e8c32
cmdr2
commited on
metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)
afbd891
Hale Chan
commited on
metal : optimize dequant q6_K kernel (llama/11892)
376cbe6
Adrian Kretz
commited on