量化的8位LLM训练和推理使用bitsandbytes在AMD GPUs上

最新推荐文章于 2025-06-24 09:38:00 发布

原创

最新推荐文章于 2025-06-24 09:38:00 发布 · 1.3k 阅读

28 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #学习

Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs — ROCm Blogs

在这篇博客文章中，我们将介绍bitsandbytes的8位表示方式。正如你将看到的，bitsandbytes的8位表示方式显著地减少了微调和推理大语言模型（LLMs）所需的内存。虽然在这一领域有许多用于减小模型尺寸的量化技术，但bitsandbytes通过量化不仅减小了模型尺寸，还减小了优化器状态的大小。这篇文章将帮助你了解bitsandbytes 8位表示方式的基本原理，解释bitsandbytes 8位优化器和LLM.int8技术，并向你展示如何使用ROCm在AMD GPUs上实现这些技术。

要求

bitsandbytes 是一个Python封装库，提供快速高效的8位量化机器学习模型。它现在由ROCm 6.2支持，并且可以无缝部署在AMD GPU上。本文中使用的所有演示均通过以下详细设置进行。有关全面的安装详情，请参阅 ROCm文档。

硬件与操作系统:
- AMD Instinct 加速器
- Ubuntu 22.04.3 LTS

软件:

ROCm 6.2.0
Pytorch 2.3.0

或者，你可以使用相同设置启动Docker容器。

   docker run -it --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device /dev/kfd --device /dev/dri rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0

安装所需的库。

pip install "transformers>=4.45.1" datasets accelerate "numpy==1.24.4" evaluate
pip uninstall apex #Hugging Face implements layer fusion without apex

注意： Apex 需要卸载，因为 Hugging Face 的一个 bug

如下所示安装bitsandbytes

如果你使用 AMD Instinct MI210/MI250/MI250X/MI300A/MI300X 或更高版本，直接从wheel包安装：

pip install --no-deps --force-reinstall 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.44.1.dev0-py3-none-manylinux_2_24_x86_64.whl'

如果你使用旧版 AMD Instinct 设备或希望直接从源代码编译，请按以下指示操作:

git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S . #使用 -DBNB_ROCM_ARCH="gfx90a;gfx942" 以针