Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5835
vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485) Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260 Co-authored-by: Rémy Oudompheng <[email protected]>
b5834
vulkan: fix rms_norm+mul fusion (#14545) The fused operation was grabbing the epsilon value from the wrong place. Add an env var to disable fusion. Add some missing checks for supported shapes/types. Handle fused rms_norm+mul in check_results.
b5833
vulkan: Handle updated FA dim2/3 definition (#14518) * vulkan: Handle updated FA dim2/3 definition Pack mask boolean and n_head_log2 into a single dword to keep the push constant block under the 128B limit. * handle null mask for gqa * allow gqa with dim3>1
b5832
server : fix assistant prefilling when content is an array (#14360)
b5831
opencl: add GELU_ERF (#14476)
b5830
eval-callback : check for empty input (#14539)
b5829
test-backend-ops: add support for specifying output format (#14368) * test-backend-ops: add support for specifying output format Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> * Add build_commit and build_number in test_result Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> * refactor Signed-off-by: Xiaodong Ye <[email protected]> * Get build commit from ggml_commit() Signed-off-by: Xiaodong Ye <[email protected]> * Merge errors into test_operation_info && address review comments Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> * remove visitor nonsense * remove visitor comment Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]> Co-authored-by: slaren <[email protected]>
b5828
metal : disable fast math in all quantize kernels (#14528) ggml-ci
b5827
batch : add optional for sequential equal split (#14511) ggml-ci
b5826
graph : prepare for 4D mask (#14515) ggml-ci