Commits · natasa365/whisper.cpp

llama : add gpt-oss (llama/15091)

bf225d6

ggerganov

ngxson HF Staff slaren commited on Aug 5, 2025

metal: SSM_SCAN performance (llama/14743)

5359e09

gabegoodhart

ggerganov commited on Jul 25, 2025

metal : fix fusion across different encoders (llama/14849)

17d67da

ggerganov commited on Jul 24, 2025

metal : fuse add, mul + add tests (llama/14596)

66ae493

ggerganov commited on Jul 18, 2025

metal : Add missing unary ops Metal support (llama/14660)

2ed022e

Yavor Ivanov commited on Jul 13, 2025

ggml : add ggml_scale_bias (llama/14417)

573d50a

ngxson HF Staff commited on Jul 9, 2025

metal : disable fast math in all quantize kernels (llama/14528)

df9d510

ggerganov commited on Jul 4, 2025

ggml : implement GEGLU_ERF and GEGLU_QUICK ops (llama/14445)

f798922

Sigbjørn Skjæret commited on Jul 3, 2025

ggml : fix FA mask dim 2 and 3 (llama/14505)

a89dc81

ggerganov commited on Jul 3, 2025

llama : initial Mamba-2 support (llama/9126)

1b4087e

compilade commited on Jul 2, 2025

ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (llama/14435)

ebacb3e

ggerganov commited on Jul 12, 2025

ci : disable fast-math for Metal GHA CI (llama/14478)

ec4b1b3

ggerganov commited on Jul 1, 2025

metal : disable fast-math for some cpy kernels (llama/14460)

9d1185a

ggerganov commited on Jun 30, 2025

ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158)

add5c0f

Sigbjørn Skjæret

ggerganov

OccamRazor Akarshan

jeffbolznv commited on Jun 29, 2025

ggml : add ggml_set_rows (llama/14274)

ac46a22

rgerganov

ggerganov commited on Jun 27, 2025

metal : add special-case mat-vec mul for ne00 == 4 (llama/14385)

724622d

ggerganov commited on Jun 26, 2025

metal : batch rows copy in a single threadgroup (llama/14384)

b4ff704

ggerganov commited on Jun 26, 2025

metal : fix thread-safety (llama/14300)

2bd85b6

ggerganov commited on Jun 21, 2025

metal : add mean kernel (llama/14267)

a726ecc

ggerganov commited on Jun 19, 2025

cmake : handle whitepsaces in path during metal build (llama/14126)

8076017

ggerganov

danbev commited on Jun 12, 2025

metal : use less stack memory in FA kernel (llama/14088)

014afb6

ggerganov commited on Jun 9, 2025

metal : use F32 accumulators in FA kernels (llama/13975)

b86860f

ggerganov commited on Jun 2, 2025

ggml : add ggml_gelu_erf() (llama/13667)

6c9cd9a

ngxson HF Staff commited on May 21, 2025

metal : fix typo in FA kernel comments (llama/13651)

4c32ada

ggerganov commited on May 20, 2025

metal : add FA-vec kernel for head size 64 (llama/13583)

36a3b4e

ggerganov commited on May 16, 2025

metal : use FA-vec kernel up to batch size 20 (llama/13496)

e925f17

ggerganov commited on May 13, 2025

metal : optimize multi-sequence FA vec kernel (llama/13493)

d2f915d

ggerganov commited on May 13, 2025

ggml : add mrope kernel for metal (llama/13457)

27b32e6

ngxson HF Staff commited on May 13, 2025

metal : optimize MoE for large batches (llama/13388)

d51c0d3

ggerganov commited on May 13, 2025

metal : fix floating-point range of attention scores in FA kernels (llama/13090)

e093044

ggerganov commited on Apr 24, 2025

metal: add neg operator (llama/13029)

42283e1

jmorganca commited on Apr 20, 2025

graph : make FA compatible with MLA + add initial Metal kernels (llama/12953)

fb0d243

ggerganov commited on Apr 17, 2025

metal : add FA-vec kernels for head size 96 (llama/12952)

f1f88b8

ggerganov commited on Apr 15, 2025

llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)

e7cb2dc

ggerganov commited on Apr 8, 2025

ggml : add bilinear upscale support (ggml/1185)

4c5e449

Diego Devesa commited on Apr 9, 2025

metal : use F32 prec in FA kernels (llama/12688)

a49f5c2

ggerganov commited on Apr 1, 2025

metal : use constexpr in FA kernels + fix typedef (llama/12659)

c699617

ggerganov commited on Mar 30, 2025

metal : improve FA + improve MoE (llama/12612)

04a3389

ggerganov commited on Mar 28, 2025

metal : refactor mat-vec code (llama/12569)

71d72f9

ggerganov commited on Mar 26, 2025

llama: Add support for RWKV v7 architecture (llama/12412)

727de7e

mollysama commited on Mar 17, 2025

metal : Cache the Metal library at the device context level (llama/12265)

e3908a2

BB-fat commited on Mar 11, 2025

ggml : skip intermediate .air file when compiling .metallib (llama/12247)

32b6ec3

danbev commited on Mar 7, 2025

metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194)

092277a

BB-fat alexju commited on Mar 7, 2025

metal : fix default.metallib build (llama/12224)

838efb6

danbev commited on Mar 7, 2025

ggml : fix GGMLMetalClass ODR (llama/12200)

2094cb7

pacominev commited on Mar 5, 2025

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)

67e8c32

cmdr2 commited on Feb 28, 2025

metal : copy kernels for quant to F32/F16 conversions (llama/12017)

6c8e7ec

Garf

ggerganov commited on Feb 25, 2025

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)

afbd891

Hale Chan commited on Feb 16, 2025

metal : optimize dequant q6_K kernel (llama/11892)

376cbe6

Adrian Kretz commited on Feb 15, 2025

repo : update links to new url (llama/11886)

9705bb5

ggerganov commited on Feb 15, 2025

Commit History

llama : add gpt-oss (llama/15091) bf225d6

metal: SSM_SCAN performance (llama/14743) 5359e09

metal : fix fusion across different encoders (llama/14849) 17d67da

metal : fuse add, mul + add tests (llama/14596) 66ae493

metal : Add missing unary ops Metal support (llama/14660) 2ed022e

ggml : add ggml_scale_bias (llama/14417) 573d50a

metal : disable fast math in all quantize kernels (llama/14528) df9d510

ggml : implement GEGLU_ERF and GEGLU_QUICK ops (llama/14445) f798922

ggml : fix FA mask dim 2 and 3 (llama/14505) a89dc81

llama : initial Mamba-2 support (llama/9126) 1b4087e

ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (llama/14435) ebacb3e

ci : disable fast-math for Metal GHA CI (llama/14478) ec4b1b3

metal : disable fast-math for some cpy kernels (llama/14460) 9d1185a

ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158) add5c0f

ggml : add ggml_set_rows (llama/14274) ac46a22

metal : add special-case mat-vec mul for ne00 == 4 (llama/14385) 724622d

metal : batch rows copy in a single threadgroup (llama/14384) b4ff704

metal : fix thread-safety (llama/14300) 2bd85b6

metal : add mean kernel (llama/14267) a726ecc

cmake : handle whitepsaces in path during metal build (llama/14126) 8076017

metal : use less stack memory in FA kernel (llama/14088) 014afb6

metal : use F32 accumulators in FA kernels (llama/13975) b86860f

ggml : add ggml_gelu_erf() (llama/13667) 6c9cd9a

metal : fix typo in FA kernel comments (llama/13651) 4c32ada

metal : add FA-vec kernel for head size 64 (llama/13583) 36a3b4e

metal : use FA-vec kernel up to batch size 20 (llama/13496) e925f17

metal : optimize multi-sequence FA vec kernel (llama/13493) d2f915d

ggml : add mrope kernel for metal (llama/13457) 27b32e6

metal : optimize MoE for large batches (llama/13388) d51c0d3

metal : fix floating-point range of attention scores in FA kernels (llama/13090) e093044

metal: add neg operator (llama/13029) 42283e1

graph : make FA compatible with MLA + add initial Metal kernels (llama/12953) fb0d243

metal : add FA-vec kernels for head size 96 (llama/12952) f1f88b8

llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825) e7cb2dc

ggml : add bilinear upscale support (ggml/1185) 4c5e449

metal : use F32 prec in FA kernels (llama/12688) a49f5c2

metal : use constexpr in FA kernels + fix typedef (llama/12659) c699617

metal : improve FA + improve MoE (llama/12612) 04a3389

metal : refactor mat-vec code (llama/12569) 71d72f9

llama: Add support for RWKV v7 architecture (llama/12412) 727de7e

metal : Cache the Metal library at the device context level (llama/12265) e3908a2

ggml : skip intermediate .air file when compiling .metallib (llama/12247) 32b6ec3

metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194) 092277a

metal : fix default.metallib build (llama/12224) 838efb6

ggml : fix GGMLMetalClass ODR (llama/12200) 2094cb7

cuda/cpu: Increase support for fp16 unary operations (ggml/1125) 67e8c32

metal : copy kernels for quant to F32/F16 conversions (llama/12017) 6c8e7ec

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904) afbd891

metal : optimize dequant q6_K kernel (llama/11892) 376cbe6

repo : update links to new url (llama/11886) 9705bb5

llama : add gpt-oss (llama/15091)

bf225d6

metal: SSM_SCAN performance (llama/14743)

5359e09

metal : fix fusion across different encoders (llama/14849)

17d67da

metal : fuse add, mul + add tests (llama/14596)

66ae493

metal : Add missing unary ops Metal support (llama/14660)

2ed022e

ggml : add ggml_scale_bias (llama/14417)

573d50a

metal : disable fast math in all quantize kernels (llama/14528)

df9d510

ggml : implement GEGLU_ERF and GEGLU_QUICK ops (llama/14445)

f798922

ggml : fix FA mask dim 2 and 3 (llama/14505)

a89dc81

llama : initial Mamba-2 support (llama/9126)

1b4087e

ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (llama/14435)

ebacb3e

ci : disable fast-math for Metal GHA CI (llama/14478)

ec4b1b3

metal : disable fast-math for some cpy kernels (llama/14460)

9d1185a

ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158)

add5c0f

ggml : add ggml_set_rows (llama/14274)

ac46a22

metal : add special-case mat-vec mul for ne00 == 4 (llama/14385)

724622d

metal : batch rows copy in a single threadgroup (llama/14384)

b4ff704

metal : fix thread-safety (llama/14300)

2bd85b6

metal : add mean kernel (llama/14267)

a726ecc

cmake : handle whitepsaces in path during metal build (llama/14126)

8076017

metal : use less stack memory in FA kernel (llama/14088)

014afb6

metal : use F32 accumulators in FA kernels (llama/13975)

b86860f

ggml : add ggml_gelu_erf() (llama/13667)

6c9cd9a

metal : fix typo in FA kernel comments (llama/13651)

4c32ada

metal : add FA-vec kernel for head size 64 (llama/13583)

36a3b4e

metal : use FA-vec kernel up to batch size 20 (llama/13496)

e925f17

metal : optimize multi-sequence FA vec kernel (llama/13493)

d2f915d

ggml : add mrope kernel for metal (llama/13457)

27b32e6

metal : optimize MoE for large batches (llama/13388)

d51c0d3

metal : fix floating-point range of attention scores in FA kernels (llama/13090)

e093044

metal: add neg operator (llama/13029)

42283e1

graph : make FA compatible with MLA + add initial Metal kernels (llama/12953)

fb0d243

metal : add FA-vec kernels for head size 96 (llama/12952)

f1f88b8

llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)

e7cb2dc

ggml : add bilinear upscale support (ggml/1185)

4c5e449

metal : use F32 prec in FA kernels (llama/12688)

a49f5c2

metal : use constexpr in FA kernels + fix typedef (llama/12659)

c699617

metal : improve FA + improve MoE (llama/12612)

04a3389

metal : refactor mat-vec code (llama/12569)

71d72f9

llama: Add support for RWKV v7 architecture (llama/12412)

727de7e

metal : Cache the Metal library at the device context level (llama/12265)

e3908a2

ggml : skip intermediate .air file when compiling .metallib (llama/12247)

32b6ec3

metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194)

092277a

metal : fix default.metallib build (llama/12224)

838efb6

ggml : fix GGMLMetalClass ODR (llama/12200)

2094cb7

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)

67e8c32

metal : copy kernels for quant to F32/F16 conversions (llama/12017)

6c8e7ec

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)

afbd891

metal : optimize dequant q6_K kernel (llama/11892)

376cbe6

repo : update links to new url (llama/11886)

9705bb5