Spaces:
Sleeping
Sleeping
Commit History
IQ3_S: a much better alternative to Q3_K (llama/5676)
32589c9
unverified
Introduce backend GUIDs (ggml/743)
a7eb9f6
unverified
UEXTM.com
slaren
commited on
ggml : always define ggml_fp16_t as uint16_t (llama/5666)
bc567d3
unverified
sync : llama.cpp (ggml/0)
f8e8d34
unverified
cuda : ignore peer access already enabled errors (llama/5597)
a817d85
unverified
slaren
commited on
ci : enable -Werror for CUDA builds (llama/5579)
df03a10
unverified
cuda, metal : fix nans in soft_max (llama/5574)
44164ac
unverified
1.5 bit quantization (llama/5453)
9c3aa6a
unverified
ggml : add ALiBi support for ggml_soft_max_ext (llama/5488)
26c019a
unverified
cuda : print message when initialization fails (llama/5512)
1f047ca
unverified
slaren
commited on
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434)
c0cfa9b
unverified
CUDA: more warps for mmvq on NVIDIA (llama/5394)
7ab774c
unverified
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386)
3ff7660
unverified
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370)
7aa3216
unverified
CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351)
ae45b38
unverified
cuda : fix LLAMA_CUDA_F16 (llama/5262)
5fd8fb7
unverified
slaren
commited on
llava : add MobileVLM support (llama/5132)
f17a416
unverified
JidongZhang-THU
slaren
commited on
sync : ggml (llama/0)
cdb7964
unverified
SOTA 3-bit quants (llama/5196)
4649943
unverified
`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)
75d438c
unverified
John Balis
slaren
commited on
ggml : add Vulkan backend (llama/2059)
5a97aba
unverified
cuda : fix tensor size calculation for non-split buffer (llama/5145)
8f3eb65
unverified
slaren
commited on
cuda : fix 2-bit quants on amd hip (llama/5105)
aadbd67
unverified
Engininja2
commited on
CUDA: more info when no device code (llama/5088)
e96ba7d
unverified
cuda : fix compile error in jetson platform (llama/4975)
0935414
unverified
Kylin
commited on
ggml : add IQ2 to test-backend-ops + refactoring (llama/4990)
227f2ae
unverified
ggml : introduce GGML_CALL function annotation (llama/4850)
7815f68
unverified
cuda : fix dequantize kernel names (llama/4938)
95f6502
unverified
CUDA: faster dequantize kernels for Q4_0 and Q4_1 (llama/4938)
73c6598
unverified
CUDA: faster q8_0 -> f16 dequantization (llama/4895)
0a1a178
unverified
llama : ggml-backend integration (llama/4766)
362430b
unverified
CUDA: fix softmax compile for old CUDA versions (llama/4862)
5eda533
unverified
ggml : SOTA 2-bit quants (add IQ2_XS) (llama/4856)
5e827d5
unverified
CUDA: faster softmax via shared memory + fp16 math (llama/4742)
52c45b9
unverified
SOTA 2-bit quants (llama/4773)
75de5bf
unverified
CUDA: fixed redundant value dequantization (llama/4809)
70c8d60
unverified
ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (llama/4787)
f391d7a
unverified
Konstantin Zhuravlyov
commited on
fix : cuda order of synchronization when setting a buffer (ggml/679)
e48c553
unverified
Erik Scholz
slaren
commited on