【学习心得】Xtuner模型qlora微调时错误记录

问题一:使用Xtuner进行模型qlora模型微调的时候,报错No module named 'triton.ops'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/output/xtuner/xtuner/tools/train.py", line 392, in <module>
    main()
  File "/output/xtuner/xtuner/tools/train.py", line 381, in main
    runner = Runner.from_cfg(cfg)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 462, in from_cfg
    runner = cls(
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 429, in __init__
    self.model = self.build_model(model)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 836, in build_model
    model = MODELS.build(model)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 234, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 123, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/output/xtuner/xtuner/model/sft.py", line 97, in __init__
    self.llm = self.build_llm_from_cfg(
  File "/output/xtuner/xtuner/model/sft.py", line 143, in build_llm_from_cfg
    llm = self._build_from_cfg_or_module(llm)
  File "/output/xtuner/xtuner/model/sft.py", line 296, in _build_from_cfg_or_module
    return BUILDER.build(cfg_or_mod)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 123, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3620, in from_pretrained
    hf_quantizer.validate_environment(
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_8bit.py", line 77, in validate_environment
    from ..integrations import validate_bnb_backend_availability
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1805, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1819, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):
No module named 'triton.ops'

根据回溯来分析错误原因:

这是在尝试使用 BitsAndBytes(8-bit 量化) 加载模型时发生的,我们查看一下 bitsandbytes 库是否正确安装。

可能是版本不对,试着重装一下后还是报错,最后查找资料发现是torch2.6以上版本和bitstandbytes版本的冲突问题。

解决方案:安装低版本的pytorch==2.5.1我们进入xtuner的requirement文件里面修改一下torch版本

然后再执行一次xtuner安装就可以了

# 在xtuner源码目录中执行
pip install -e .

问题二:使用Xtuner进行模型qlora模型微调的时候,报错KeyError: 'qwen'

Traceback (most recent call last):
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1071, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 773, in __getitem__
    raise KeyError(key)
KeyError: 'qwen'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/output/xtuner/xtuner/tools/train.py", line 392, in <module>
    main()
  File "/output/xtuner/xtuner/tools/train.py", line 381, in main
    runner = Runner.from_cfg(cfg)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 462, in from_cfg
    runner = cls(
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 429, in __init__
    self.model = self.build_model(model)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 836, in build_model
    model = MODELS.build(model)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 234, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 123, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/output/xtuner/xtuner/model/sft.py", line 97, in __init__
    self.llm = self.build_llm_from_cfg(
  File "/output/xtuner/xtuner/model/sft.py", line 142, in build_llm_from_cfg
    llm = self._dispatch_lm_model_cfg(llm_cfg, max_position_embeddings)
  File "/output/xtuner/xtuner/model/sft.py", line 281, in _dispatch_lm_model_cfg
    llm_cfg = AutoConfig.from_pretrained(
  File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1073, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `qwen` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

原因分析:当前安装的Transformers 库不支持这个模型最新qwen3模型。

解决方案:qwen3的模型配置文件中说明了支持的transformers版本为4.51.0但是我要使用Xtuner微调,xtuner框架支持的transformers版本是4.48.0所以我不能安装最新的版本。于是可以修改配置文件中的model_type字段,改成qwen2

成功开始训练
### 使用 XTuner 进行大模型微调的方法教程 #### 1. 构建数据集 为了使模型能够学习特定的任务或行为,首先需要准备高质量的数据集。根据需求,可以创建一个对话形式的数据集,其中包含提问和对应的理想回答。这些数据将用于指导模型的学习方向[^1]。 ```bash # 数据集应保存为 JSONL 文件格式,每条记录是一个独立的对话样本。 { "instruction": "解释什么是机器学习", "input": "", "output": "机器学习是一种通过算法让计算机从经验中自动改进的技术..." } ``` #### 2. 配置文件编写 XTuner 的配置文件定义了训练的具体参数以及使用的模型架构等信息。以 `internlm_chat_7b` 模型为例,可以通过修改现有的 `.py` 配置模板来适配自己的任务场景[^3]。 以下是简化版配置文件的一个片段: ```python from xtuner.engine import Config model = dict( type=&#39;InternLM&#39;, version=&#39;chat-7b&#39; ) train_dataset_type = &#39;CustomDataset&#39; data_root = &#39;/path/to/dataset&#39; max_epochs = 3 per_device_train_batch_size = 4 gradient_accumulation_steps = 8 learning_rate = 5e-5 weight_decay = 0.01 warmup_ratio = 0.05 lr_scheduler_type = &#39;cosine&#39; logging_dir = &#39;./logs&#39; save_total_limit = 3 checkpointing_steps = 1000 fp16 = True bf16 = False deepspeed_config_path = "/root/deepspeed_zero2.json" work_dir = "./work_dirs/assistTuner" cfg = Config(locals()) ``` #### 3. 开始微调过程 利用 XTuner 提供的命令行工具可以直接运行指定配置下的微调流程。下面展示了一个典型的执行指令案例: ```bash xtuner train ./config/internlm2_5_chat_7b_qlora_alpaca_e3_copy.py \ --deepspeed deepspeed_zero2 \ --work-dir ./work_dirs/assistTuner ``` 此命令会加载预设好的超参设置并基于 DeepSpeed 技术加速计算效率,在工作目录下逐步存储中间结果与最终完成后的权重文件。 #### 4. 转换至 Hugging Face 格式 当微调完成后如果希望分享成果或者进一步部署应用,则可能需要用到标准框架支持的形式。XTuner 支持把内部生成的结果导出成兼容 Hugging Face Transformers 库的标准结构化存档[^2]。 转换操作如下所示: ```bash xtuner convert-to-hf-model \ ./work_dirs/assistTuner/best_model.pth \ /root/output/huggingface_model/ ``` 这样就可以轻松地与其他依赖该生态系统的项目集成起来了。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值