KTransformers安装笔记
0、前提条件
CUDA12.4
建议升级到较新版本的CMake,安装git,g++,gcc
执行以下命令,把 CUDA_HOME 指向你 CUDA 的安装目录(比如 /usr/local/cuda-12.4)
export CUDA_HOME=/usr/local/cuda-12.4
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
source ~/.bashrc
查看是否成功:
which nvcc
输出:表示配置成功
/usr/local/cuda-12.4/bin/nvcc
创建本地容器
docker run -it -d --gpus all --privileged -p 8083:8080 -p 8084:8084 -p 8055:22 --name KT_ubuntu2204_cuda124_cudnn8700 -v /home/data_c/KT_data/:/home/data_c/KT_data/ -v /home/data_a/gqr/:/home/data_a afa4f07f5e5e /bin/bash
默认cuda与cudnn已经安装完毕,conda也安装完毕
1、安装相关依赖
sudo apt-get update
sudo apt-get install build-essential cmake ninja-build patchelf
2、创建虚拟环境并操作
conda create --name ktransformers python=3.11
conda activate ktransformers # 激活环境
conda install -c conda-forge libstdcxx-ng # Anaconda provides a package called `libstdcxx-ng` that includes a newer version of `libstdc++`, which can be installed via `conda-forge`.
# 查看指令
strings ~/anaconda3/envs/ktransformers/lib/libstdc++.so.6 | grep GLIBCXX
3、安装pytorch==2.6版本
pip install torch torchvision torchaudio
国内:
pip3 install torch torchvision torchaudio --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install packaging ninja cpufeature numpy --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple
安装flash-attention
pip install flash_attn
最好执行上述语句安装,因为这中安装方式成功后,表明相关的nvcc安装成功,对接下来的安装更优把握
也可以直接下载对应cuda与torch版本的.whl包(官网说可以,没亲自尝试过)
https://github.com/Dao-AILab/flash-attention/releases
或者:
pip install flash-attn --find-links https://github.com/Dao-AILab/flash-attn/releases
判断flash-attn是否安装成功执行如下:
import flash_attn
print(flash_attn.__version__)
# 有时 flash_attn 会安装成功但 CUDA 编译失败,你可以进一步测试核心 CUDA 扩展:
from flash_attn.layers.rotary import RotaryEmbedding
from flash_attn.bert_padding import pad_input, unpad_input
正常运行,则表示成功!!!
apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libgflags-dev zlib1g-dev libfmt-dev
apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
apt install libnuma-dev
pip install packaging ninja cpufeature numpy openai
4、下载项目源代码:
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
针对与不同情况的安装方式:
针对于cpu and 1T RAM硬件条件:
# For Multi-concurrency with two cpu and 1T RAM:
apt install libnuma-dev
export USE_NUMA=1
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
经过漫长的时间等待后,编译成功,如下所示:
5、相关权重文件以及配置文件下载
下载Deepseek-R1-Q4模型权重
# 安装魔搭社区
pip install modelscope
# 模型权重文件下载
modelscope download --model lmstudio-community/DeepSeek-R1-GGUF --include DeepSeek-R1-Q4_K_M* --local_dir ./dir
下载Deepseek-R1-Q4 config配置文件
modelscope download --model deepseek-ai/DeepSeek-R1 --exclude *.safetensors --local_dir ./config
报错类型:
报错一:
编译阶段报错如下:
ERROR: Failed to build installable wheels for some pyproject.toml based projects (ktransformers)
官网解决方案:
语句:
sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
###############################################################################
就在ktransformers/csrc/balance_serve/CMakeLists.txt里加上一行set(CMAKE_CUDA_STANDARD 17),应该就可以编译过去。
重新编译后:
报错二:
问题描述:
执行语句如下
python ktransformers/server/main.py \
--port 10002 \
--model_path /home/data_c/KT_data/config \
--gguf_path /home/data_c/KT_data/dir \
--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
--max_new_tokens 1024 \
--cache_lens 32768 \
--chunk_size 256 \
--max_batch_size 4 \
--backend_type balance_serve \
--force_think