PEFT — 参数高效微调
简介
PEFT(Parameter-Efficient Fine-Tuning)在冻结大部分预训练参数的情况下,只训练少量新增参数,实现低资源高效微调。LoRA 是最主流的 PEFT 方法。
bash
pip install peft transformers datasets accelerate bitsandbytesLoRA 原理
LoRA 在原始权重矩阵旁添加低秩分解矩阵:
原始权重: W (d × d) → 冻结,不更新
LoRA 增量: ΔW = A × B
A: (d × r) r << d
B: (r × d)
推理时: W' = W + α/r × A × B只需训练 A 和 B,参数量从 d² 降至 2dr。
LoRA 微调实战
python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training
import torch
# 1. 加载基础模型(4bit 量化)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
model_name = "Qwen/Qwen2-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
# 2. 准备量化模型用于训练
model = prepare_model_for_kbit_training(model)
# 3. 配置 LoRA
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # 秩,越大效果越好但参数越多
lora_alpha=32, # 缩放因子,通常 = 2r
lora_dropout=0.1,
target_modules=[ # 应用 LoRA 的模块
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
],
bias="none"
)
# 4. 应用 LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 输出: trainable params: 20,971,520 || all params: 7,721,795,584 || trainable%: 0.27%准备训练数据
python
from datasets import Dataset
import json
# 金融问答数据集格式
training_data = [
{
"instruction": "解释什么是不良贷款率(NPL)",
"input": "",
"output": "不良贷款率(Non-Performing Loan Ratio)是指银行不良贷款余额占贷款总余额的比例..."
},
{
"instruction": "分析以下企业的信贷风险",
"input": "某科技公司,成立2年,营收300万,负债率75%,现金流为负",
"output": "综合评估为高风险:1. 成立时间短,经营稳定性待验证;2. 负债率75%超过警戒线..."
}
]
def format_instruction(sample):
"""格式化为对话模板"""
if sample["input"]:
prompt = f"### 指令:\n{sample['instruction']}\n\n### 输入:\n{sample['input']}\n\n### 回答:\n"
else:
prompt = f"### 指令:\n{sample['instruction']}\n\n### 回答:\n"
return {
"text": prompt + sample["output"] + tokenizer.eos_token
}
dataset = Dataset.from_list(training_data)
dataset = dataset.map(format_instruction)
def tokenize(sample):
return tokenizer(
sample["text"],
truncation=True,
max_length=2048,
padding="max_length"
)
tokenized_dataset = dataset.map(tokenize, batched=True)训练
python
from transformers import TrainingArguments, Trainer, DataCollatorForSeq2Seq
training_args = TrainingArguments(
output_dir="./lora-qwen2-finance",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4, # 等效 batch_size=16
learning_rate=2e-4,
fp16=True,
logging_steps=10,
save_steps=100,
warmup_ratio=0.05,
lr_scheduler_type="cosine",
report_to="none"
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
data_collator=DataCollatorForSeq2Seq(tokenizer, pad_to_multiple_of=8)
)
trainer.train()
# 保存 LoRA 权重(只保存增量,很小)
model.save_pretrained("./lora-weights")
tokenizer.save_pretrained("./lora-weights")加载微调模型推理
python
from peft import PeftModel
# 加载基础模型
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# 加载 LoRA 权重
model = PeftModel.from_pretrained(base_model, "./lora-weights")
model = model.merge_and_unload() # 合并权重,推理更快
# 推理
inputs = tokenizer("### 指令:\n分析招商银行的投资价值\n\n### 回答:\n", return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))LoRA 参数调优
| 参数 | 推荐值 | 说明 |
|---|---|---|
| r | 8-64 | 越大效果越好,参数越多 |
| lora_alpha | 2r | 通常设为 r 的 2 倍 |
| lora_dropout | 0.05-0.1 | 防止过拟合 |
| target_modules | q,k,v,o | 至少包含注意力层 |