代码出现问题:(style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py
INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
trainable params: 1,572,864 || all params: 1,838,401,536 || trainable%: 0.0856
训练集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]}
验证集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]}
Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead.
INFO:__main__:GPU内存使用: 已分配 2.93GB, 保留 4.13GB
可训练参数列表:
- base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight
0%| | 0/3 [00:00<?, ?it/s]You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.00GB, 保留 4.21GB
Could not estimate the number of tokens of the input, floating-point operations will not be computed
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.22GB
33%|████████████████████████████ | 1/3 [00:03<00:06, 3.25s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB
67%|████████████████████████████████████████████████████████ | 2/3 [00:06<00:02, 2.98s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB
{'train_runtime': 9.034, 'train_samples_per_second': 0.664, 'train_steps_per_second': 0.332, 'train_loss': 1.0772175788879395, 'epoch': 3.0}
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:09<00:00, 3.01s/it]
Traceback (most recent call last):
File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 356, in <module>
eval_results = trainer.evaluate()
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4076, in evaluate
output = eval_loop(
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4270, in evaluation_loop
losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4496, in prediction_step
outputs = model(**inputs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 818, in forward
return model_forward(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 806, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\peft\peft_model.py", line 1719, in forward
return self.base_model(
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\peft\tuners\tuners_utils.py", line 197, in forward
return self.model.forward(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 816, in forward
outputs = self.model(
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 521, in forward
raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
ValueError: You must specify exactly one of input_ids or inputs_embeds
(style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py
Traceback (most recent call last):
File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 57, in <module>
class ContrastiveTrainer(Trainer):
File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 63, in ContrastiveTrainer
eval_dataset: Optional[Dataset] = None,
NameError: name 'Dataset' is not defined
原代码如下:import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer,
PreTrainedTokenizerBase,
BitsAndBytesConfig
)
from transformers.tokenization_utils_base import PreTrainedTokenizerBase
from transformers.utils import PaddingStrategy
from datasets import load_dataset
from typing import Any, Dict, List, Optional, Tuple, Union
import logging
from dataclasses import dataclass
import os
import gc
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
@dataclass
class EvalDataCollator:
"""评估专用的数据收集器"""
tokenizer: PreTrainedTokenizerBase
padding: Union[bool, str, PaddingStrategy] = True
max_length: Optional[int] = None
pad_to_multiple_of: Optional[int] = None
return_tensors: str = "pt"
def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, torch.Tensor]:
# 评估时只使用正样本(用于语言建模评估)
positive_features = [{"input_ids": f["positive_input_ids"]} for f in features]
# 对正样本进行填充
batch_positive = self.tokenizer.pad(
positive_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors=self.return_tensors,
)
# 创建注意力掩码
attention_mask = (batch_positive["input_ids"] != self.tokenizer.pad_token_id).int()
# 创建标签(用于语言建模)
labels = batch_positive["input_ids"].clone()
labels[labels == self.tokenizer.pad_token_id] = -100
return {
"input_ids": batch_positive["input_ids"],
"attention_mask": attention_mask,
"labels": labels
}
class ContrastiveTrainer(Trainer):
"""内存优化的训练器"""
# ... [保持其他方法不变] ...
def evaluate(
self,
eval_dataset: Optional[Dataset] = None,
ignore_keys: Optional[List[str]] = None,
metric_key_prefix: str = "eval",
) -> Dict[str, float]:
"""重写评估方法以使用专用的数据收集器"""
# 创建评估专用的数据收集器
eval_data_collator = EvalDataCollator(
tokenizer=self.tokenizer,
max_length=256,
padding="max_length"
)
# 临时保存原始数据收集器
original_collator = self.data_collator
try:
# 使用评估专用的数据收集器
self.data_collator = eval_data_collator
# 调用父类的评估方法
return super().evaluate(
eval_dataset=eval_dataset,
ignore_keys=ignore_keys,
metric_key_prefix=metric_key_prefix
)
finally:
# 恢复原始数据收集器
self.data_collator = original_collator
# 设置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# 内存优化工具函数
def clear_memory():
"""清除Python和CUDA缓存"""
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()
def print_memory_usage():
"""打印当前内存使用情况"""
if torch.cuda.is_available():
allocated = torch.cuda.memory_allocated() / (1024 ** 3)
reserved = torch.cuda.memory_reserved() / (1024 ** 3)
logger.info(f"GPU内存使用: 已分配 {allocated:.2f}GB, 保留 {reserved:.2f}GB")
else:
logger.info("未检测到GPU")
def tokenize_function(examples, tokenizer, max_length=256):
"""将文本转换为token IDs"""
tokenized = {}
# 对每个字段进行分词
for key in ['anchor', 'positive', 'negative']:
if key in examples:
# 使用分词器处理文本
result = tokenizer(
examples[key],
max_length=max_length,
truncation=True,
padding=False,
return_tensors=None
)
tokenized[f"{key}_input_ids"] = result["input_ids"]
return tokenized
@dataclass
class ContrastiveDataCollator:
"""内存优化的数据收集器"""
tokenizer: PreTrainedTokenizerBase
padding: Union[bool, str, PaddingStrategy] = True
max_length: Optional[int] = None
pad_to_multiple_of: Optional[int] = None
return_tensors: str = "pt"
def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, torch.Tensor]:
# 分离出三元组的各个部分
anchor_features = [{"input_ids": f["anchor_input_ids"]} for f in features]
positive_features = [{"input_ids": f["positive_input_ids"]} for f in features]
negative_features = [{"input_ids": f["negative_input_ids"]} for f in features]
# 对每个部分分别进行填充
batch_anchor = self.tokenizer.pad(
anchor_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors=self.return_tensors,
)
batch_positive = self.tokenizer.pad(
positive_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors=self.return_tensors,
)
batch_negative = self.tokenizer.pad(
negative_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors=self.return_tensors,
)
# 创建注意力掩码
def create_attention_mask(input_ids):
return (input_ids != self.tokenizer.pad_token_id).int()
# 释放中间变量内存
del anchor_features, positive_features, negative_features
clear_memory()
return {
"anchor_input_ids": batch_anchor["input_ids"],
"anchor_attention_mask": create_attention_mask(batch_anchor["input_ids"]),
"positive_input_ids": batch_positive["input_ids"],
"positive_attention_mask": create_attention_mask(batch_positive["input_ids"]),
"negative_input_ids": batch_negative["input_ids"],
"negative_attention_mask": create_attention_mask(batch_negative["input_ids"]),
}
class ContrastiveTrainer(Trainer):
"""内存优化的训练器"""
def __init__(self, tokenizer=None, *args, contrastive_config=None, **kwargs):
# 首先调用父类初始化
super().__init__(*args, **kwargs)
# 关键修复:设置tokenizer
self.tokenizer = tokenizer
if contrastive_config is None:
contrastive_config = {}
# 设置默认值
self.temperature = contrastive_config.get("temperature", 0.07)
self.margin = contrastive_config.get("margin", 0.3)
self.contrastive_weight = contrastive_config.get("weight", 0.8)
self.repr_layer = contrastive_config.get("repr_layer", -1)
# 验证必要参数
if not hasattr(self.model.config, "output_hidden_states") or not self.model.config.output_hidden_states:
raise ValueError("模型必须设置output_hidden_states=True")
self.cross_entropy = nn.CrossEntropyLoss()
def compute_contrastive_loss(self, anchor_emb, pos_emb, neg_emb):
"""计算对比损失"""
# 计算余弦相似度
pos_sim = F.cosine_similarity(anchor_emb, pos_emb)
neg_sim = F.cosine_similarity(anchor_emb, neg_emb)
# 计算InfoNCE损失
numerator = torch.exp(pos_sim / self.temperature)
denominator = numerator + torch.exp(neg_sim / self.temperature)
info_nce_loss = -torch.log(numerator / (denominator + 1e-8)).mean()
# 计算三元组损失
triplet_loss = F.relu(neg_sim - pos_sim + self.margin).mean()
return info_nce_loss + triplet_loss
def get_sequence_representation(self, outputs, attention_mask):
"""获取序列表示(内存优化版)"""
# 只获取需要的隐藏状态层
hidden_states = outputs.hidden_states[self.repr_layer]
# 获取每个序列的最后一个非填充token
seq_lengths = attention_mask.sum(dim=1) - 1
batch_indices = torch.arange(hidden_states.size(0))
# 返回对应位置的隐藏状态
return hidden_states[batch_indices, seq_lengths]
def compute_loss(self, model, inputs, return_outputs=False):
"""内存优化的损失计算"""
# 确保模型处于训练模式
model.train()
# 提取输入
anchor_ids = inputs["anchor_input_ids"]
anchor_mask = inputs["anchor_attention_mask"]
positive_ids = inputs["positive_input_ids"]
positive_mask = inputs["positive_attention_mask"]
negative_ids = inputs["negative_input_ids"]
negative_mask = inputs["negative_attention_mask"]
# 前向传播获取隐藏状态
def get_embeddings(input_ids, attention_mask):
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask,
output_hidden_states=True,
return_dict=True
)
return self.get_sequence_representation(outputs, attention_mask)
# 获取三元组的嵌入表示
anchor_emb = get_embeddings(anchor_ids, anchor_mask)
pos_emb = get_embeddings(positive_ids, positive_mask)
neg_emb = get_embeddings(negative_ids, negative_mask)
# 计算对比损失
cl_loss = self.compute_contrastive_loss(anchor_emb, pos_emb, neg_emb)
cl_loss = cl_loss * self.contrastive_weight
# 关键修复:确保tokenizer已设置
if self.tokenizer is None:
raise ValueError("Tokenizer未设置!")
# 计算语言建模损失
lm_labels = positive_ids.clone()
# 关键修复:使用tokenizer的pad_token_id
pad_token_id = self.tokenizer.pad_token_id
lm_labels[lm_labels == pad_token_id] = -100
# 计算语言建模损失
lm_outputs = model(
input_ids=positive_ids,
attention_mask=positive_mask,
labels=lm_labels
)
lm_loss = lm_outputs.loss
# 总损失 = LM损失 + 对比损失
total_loss = lm_loss + cl_loss
# 记录内存使用
print_memory_usage()
return (total_loss, lm_outputs) if return_outputs else total_loss
# ================ 主程序 ================ #
if __name__ == "__main__":
# 配置量化以减少内存使用
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, # 使用4位量化
bnb_4bit_quant_type="nf4", # 使用NF4量化类型
bnb_4bit_use_double_quant=True, # 双重量化
bnb_4bit_compute_dtype=torch.float16 # 计算使用FP16
)
# 加载模型和分词器(使用量化)
model = AutoModelForCausalLM.from_pretrained(
"model/Qwen/Qwen1.5-1.8B",
quantization_config=bnb_config, # 应用量化配置
device_map="auto", # 自动选择设备
output_hidden_states=True, # 必须设置以获取隐藏状态
return_dict_in_generate=True,
use_cache=False # 禁用缓存以节省内存
)
tokenizer = AutoTokenizer.from_pretrained("model/Qwen/Qwen1.5-1.8B")
tokenizer.pad_token = tokenizer.eos_token # 设置填充token
# 为量化模型添加LoRA适配器
lora_config = LoraConfig(
r=8,
lora_alpha=32,
target_modules=["q_proj", "v_proj"], # 针对Qwen1.5-1.8B模型
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
# 关键修复:准备模型用于k位训练
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
# 添加LoRA适配器
model = get_peft_model(model, lora_config)
# 关键修复:显式启用LoRA参数的梯度
for param in model.parameters():
if param.requires_grad:
param.requires_grad = True
model.print_trainable_parameters() # 打印可训练参数数量
# 加载数据集
def load_and_tokenize_dataset(file_path, tokenizer):
"""加载数据集并进行分词处理"""
# 加载原始数据集
dataset_dict = load_dataset('json', data_files=file_path)
raw_dataset = dataset_dict['train']
# 应用分词函数
tokenized_dataset = raw_dataset.map(
lambda ex: tokenize_function(ex, tokenizer, max_length=256),
batched=True,
batch_size=8, # 减小批处理大小
remove_columns=['anchor', 'positive', 'negative']
)
return tokenized_dataset
train_dataset = load_and_tokenize_dataset('data/processed/train_style_triplets.json', tokenizer)
val_dataset = load_and_tokenize_dataset('data/processed/val_style_triplets.json', tokenizer)
# 验证数据集格式
print("训练集样本示例:", train_dataset[0])
print("验证集样本示例:", val_dataset[0])
# 训练参数配置(内存优化)
training_args = TrainingArguments(
output_dir="./model/lora_adapter",
per_device_train_batch_size=1, # 减小批量大小
gradient_accumulation_steps=8, # 增加梯度累积步数
num_train_epochs=3,
learning_rate=2e-4,
logging_steps=10, # 更频繁的日志记录以监控内存
save_steps=500,
fp16=True,
report_to="none",
remove_unused_columns=False,
gradient_checkpointing=True, # 启用梯度检查点
optim="adafactor", # 使用内存更少的优化器
)
# 对比学习配置
contrastive_config = {
"temperature": 0.07,
"margin": 0.3,
"weight": 0.8,
"repr_layer": -1
}
# 初始化数据收集器
data_collator = ContrastiveDataCollator(
tokenizer=tokenizer,
max_length=256, # 减少最大长度
padding="max_length"
)
# 初始化训练器 - 关键修复:传递tokenizer
trainer = ContrastiveTrainer(
model=model,
args=training_args,
tokenizer=tokenizer, # 传递tokenizer
data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=val_dataset,
contrastive_config=contrastive_config
)
# 开始训练前打印内存状态
print_memory_usage()
# 关键修复:验证可训练参数
print("可训练参数列表:")
for name, param in model.named_parameters():
if param.requires_grad:
print(f"- {name}")
# 开始训练
trainer.train()
# 保存LoRA适配器
model.save_pretrained("./model/lora_adapter")
# 评估模型
try:
eval_results = trainer.evaluate()
print("评估结果:", eval_results)
except Exception as e:
print(f"评估过程中发生错误: {e}")
import traceback
traceback.print_exc()