Releases · ggml-org/llama.cpp

06 Jul 10:44

6491d6e

b5835 Latest

Latest

vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485)

Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260

Co-authored-by: Rémy Oudompheng <[email protected]>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6
373 MB 2025-07-06T10:44:23Z
llama-b5835-bin-macos-arm64.zip

sha256:4ee74fae9ed0b800e02290230e6caa423dd80da4fa553bfa769475df7216c499
10.5 MB 2025-07-06T10:44:37Z
llama-b5835-bin-macos-x64.zip

sha256:77d8688d85a7b8730becfa8bdab2df944f8e50c8431b3bc5d9d4c658f4df531e
26.3 MB 2025-07-06T10:44:38Z
llama-b5835-bin-ubuntu-vulkan-x64.zip

sha256:a0b6028aa074211bdabce9d851924fd2affa45ae64761290808a5f37c148397b
20.2 MB 2025-07-06T10:44:39Z
llama-b5835-bin-ubuntu-x64.zip

sha256:73c112dee4c4f02a727f59ca43161ac39a185273c01789ab3265de9bb0ad834d
12.4 MB 2025-07-06T10:44:41Z
llama-b5835-bin-win-cpu-arm64.zip

sha256:57aaefdf1998cbc700af5315cffea52845a73032e1650fbfd25b8dfe4849fc23
10.8 MB 2025-07-06T10:44:42Z
llama-b5835-bin-win-cpu-x64.zip

sha256:cdf30a836759faf14924f87388abef7330739b1dc645bdb7b7f8785e30aa8ff1
13.6 MB 2025-07-06T10:44:43Z
llama-b5835-bin-win-cuda-12.4-x64.zip

sha256:d08af25f9593b1b9c68d233190e8c528a2bbd2344645509d273e03b177c1c14b
128 MB 2025-07-06T10:44:45Z
llama-b5835-bin-win-hip-radeon-x64.zip

sha256:f460cd659831ffa7eab8163d5030d1241e78682ba34dd4f987fc8a56a1bd06d7
298 MB 2025-07-06T10:44:51Z
llama-b5835-bin-win-opencl-adreno-arm64.zip

sha256:d31e5e2efdb8d4df870d3b07efd7ab9d6bb241645985bf2d20177de5914b9eef
11.1 MB 2025-07-06T10:45:01Z
Source code (zip)

2025-07-06T10:29:36Z
Source code (tar.gz)

2025-07-06T10:29:36Z

06 Jul 09:08

github-actions

b5834

e592be1

b5834

vulkan: fix rms_norm+mul fusion (#14545)

The fused operation was grabbing the epsilon value from the wrong place.

Add an env var to disable fusion.

Add some missing checks for supported shapes/types.

Handle fused rms_norm+mul in check_results.

Assets 15

05 Jul 07:48

github-actions

b5833

a0374a6

b5833

vulkan: Handle updated FA dim2/3 definition (#14518)

* vulkan: Handle updated FA dim2/3 definition

Pack mask boolean and n_head_log2 into a single dword to keep the push
constant block under the 128B limit.

* handle null mask for gqa

* allow gqa with dim3>1

Assets 15

05 Jul 07:38

github-actions

b5832

ddef995

b5832

server : fix assistant prefilling when content is an array (#14360)

Assets 15

05 Jul 06:41

github-actions

b5831

6681688

b5831

opencl: add GELU_ERF (#14476)

Assets 15

05 Jul 05:15

github-actions

b5830

bac8bed

b5830

eval-callback : check for empty input (#14539)

Assets 15

05 Jul 05:08

github-actions

b5829

b81510a

b5829

test-backend-ops: add support for specifying output format (#14368)

* test-backend-ops: add support for specifying output format

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* Add build_commit and build_number in test_result

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* refactor

Signed-off-by: Xiaodong Ye <[email protected]>

* Get build commit from ggml_commit()

Signed-off-by: Xiaodong Ye <[email protected]>

* Merge errors into test_operation_info && address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* remove visitor nonsense

* remove visitor comment

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>
Co-authored-by: slaren <[email protected]>

Assets 15

04 Jul 16:39

github-actions

b5828

ef797db

b5828

metal : disable fast math in all quantize kernels (#14528)

ggml-ci

Assets 15

04 Jul 06:58

github-actions

b5827

67d1ef2

b5827

batch : add optional for sequential equal split (#14511)

ggml-ci

Assets 15

04 Jul 06:54

github-actions

b5826

7b50f7c

b5826

graph : prepare for 4D mask (#14515)

ggml-ci

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5835

Uh oh!

b5834

Uh oh!

b5833

Uh oh!

b5832

Uh oh!

b5831

Uh oh!

b5830

Uh oh!

b5829

Uh oh!

b5828

Uh oh!

b5827

Uh oh!

b5826

Uh oh!