Centos FastChat 部署 DeepSeek 模型_部署 fastchat-CSDN博客

1. CentOS 环境准备

1.1 安装基础依赖

sudo yum update -y
sudo yum install -y git wget python3 python3-devel gcc-c++ make openssl-devel bzip2-devel libffi-devel

1.2 安装 Python 3.10+

# 安装 Python 3.10
sudo yum install -y epel-release
sudo yum install -y python3.10 python3.10-devel python3.10-venv

# 设置 Python 3.10 为默认版本
sudo alternatives --set python /usr/bin/python3.10

1.3 安装 CUDA 11.8（NVIDIA GPU 必需）

# 添加 NVIDIA 仓库
sudo tee /etc/yum.repos.d/cuda.repo <<EOF
[cuda]
name=CUDA Repository
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64
enabled=1
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
EOF

# 安装 CUDA
sudo yum install -y cuda-11-8

# 添加环境变量
echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# 验证安装
nvidia-smi  # 应显示 GPU 信息
nvcc --version  # 应显示 CUDA 11.8

2. 安装 FastChat 和模型

2.1 创建 Python 虚拟环境

python3.10 -m venv fastchat-env
source fastchat-env/bin/activate

2.2 安装 PyTorch 和 FastChat

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install "fschat[model_worker,webui]"

2.3 下载 DeepSeek 模型

# 安装 Git LFS
sudo yum install -y git-lfs
git lfs install

# 下载模型（7B 版本示例）
git clone https://huggingface.co/deepseek-ai/deepseek-llm-7b

（如果网络问题，可手动下载模型文件并解压到 deepseek-llm-7b 目录）

3. 启动 FastChat 服务

3.1 启动 Controller（管理节点）

# 新终端窗口
source fastchat-env/bin/activate
python -m fastchat.serve.controller

3.2 启动 Model Worker（加载模型）

# 新终端窗口
source fastchat-env/bin/activate
python -m fastchat.serve.model_worker \
    --model-path ./deepseek-llm-7b \
    --model-names "deepseek-7b" \
    --device "cuda" \
    --load-8bit  # 如果显存不足

3.3 启动 OpenAI API 服务

# 新终端窗口
source fastchat-env/bin/activate
python -m fastchat.serve.openai_api_server \
    --host 0.0.0.0 \
    --port 9997

4. 测试 API

4.1 使用 curl 测试

curl http://localhost:9997/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-7b",
    "messages": [{"role": "user", "content": "你好！你是谁？"}],
    "temperature": 0.7
  }'

4.2 Python 测试脚本

import requests

response = requests.post(
    "http://localhost:9997/v1/chat/completions",
    json={
        "model": "deepseek-7b",
        "messages": [{"role": "user", "content": "用中文解释量子计算"}],
        "temperature": 0.7
    }
)
print(response.json()["choices"][0]["message"]["content"])

5. 可选功能

5.1 启动 Web UI

python -m fastchat.serve.gradio_web_server

访问 http://<你的服务器IP>:7860 即可在网页聊天。

5.2 关闭服务

# 在运行 Controller/Worker/API 的终端按 Ctrl+C

6. 常见问题解决

Q1: 显存不足

# 使用 4-bit 量化（需安装额外依赖）
pip install auto-gptq
python -m fastchat.serve.model_worker \
    --model-path ./deepseek-llm-7b \
    --device "cuda" \
    --load-4bit

Q2: 端口冲突

# 修改 API 服务端口
python -m fastchat.serve.openai_api_server --port 12345

Q3: 防火墙设置

# 开放端口（如果需要远程访问）
sudo firewall-cmd --zone=public --add-port=9997/tcp --permanent
sudo firewall-cmd --reload

总结

CentOS 安装 CUDA 和 Python 3.10
用 FastChat 启动 DeepSeek 模型服务
通过 http://localhost:9997/v1 调用 API
Web 界面访问 http://IP:7860

现在你的 CentOS 服务器已经可以本地运行 DeepSeek 模型了！ 🚀