微调实战 - 使用 Unsloth 微调 QwQ 32B 4bit (单卡4090)

本文参考视频教程:赋范课堂 – 只需20G显存,QwQ-32B高效微调实战!4大微调工具精讲!知识灌注+问答风格微调,DeepSeek R1类推理模型微调+Cot数据集创建实战打造定制大模型!
https://www.bilibili.com/video/BV1YoQoYQEwF/
课件资料:https://kq4b3vgg5b.feishu.cn/wiki/LxI9wmuFmiaLCkkoiCIcKvOan7Q
在此之上有删改

赋范课堂 有非常好的课程,推荐大家去学习观看


文章目录

    • 一、基本准备
      • 1、安装unsloth
      • 2、wandb 安装与注册
      • 3、下载模型
        • 安装 huggingface_hub
        • 使用screen开启持久化会话
        • 设置模型国内访问镜像
        • 下载模型
        • 修改模型默认下载地址
    • 二、模型调用测试
      • modelscope 调用
      • Ollama 调用
      • vLLM 调用
        • 请求测试
    • 三、下载微调数据集
      • 下载 NuminaMath CoT 数据集
      • 下载 medical-o1-reasoning-SFT数据集
    • 四、加载模型
    • 五、微调前测试
      • 基本问答测试
      • 复杂问题测试
      • 原始模型的医疗问题问答
    • 六、最小可行性实验
      • 定义提示词
      • 定义数据集处理函数
      • 整理数据
      • 开启微调
      • 微调说明
        • 相关库
        • 模型微调 **参数解析**
          • ① `SFTTrainer` 部分
          • ② `TrainingArguments` 部分
      • 设置 wandb、开始微调
      • 查看效果
      • 模型合并
      • 保存为 GGUF
    • 七、完整高效微调实验
      • 测试


一、基本准备

1、安装unsloth

pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

2、wandb 安装与注册

wandb 类似于 tensorboard,但比它稳定

注册:https://wandb.ai/site
API Key : https://wandb.ai/ezcode/t0322?product=models

注册和使用,详见:https://blog.csdn.net/lovechris00/article/details/146437418


安装 库

pip install wandb

登录,输入 API key

wandb login

3、下载模型

https://huggingface.co/unsloth/QwQ-32B-unsloth-bnb-4bit


安装 huggingface_hub
pip install huggingface_hub

使用screen开启持久化会话

模型下载时间可能持续0.5-1个小时。避免因为关闭会话导致下载中断


安装 screen

sudo apt install screen

screen -S qwq

设置模型国内访问镜像

Linux 上 ~/.bashrc 添加环境变量

export HF_ENDPOINT='https://hf-mirror.com' 

下载模型
huggingface-cli download --resume-download  unsloth/QwQ-32B-unsloth-bnb-4bit

修改模型默认下载地址

模型默认下载到 ~/.cache/huggingface/hub/,如果想改到其它地方,可以设置 HF_HOME

export HF_HOME="/root/xx/HF_download"

二、模型调用测试

modelscope 调用

from modelscope import AutoModelForCausalLM, AutoTokenizermodel_name = "unsloth/QwQ-32B-unsloth-bnb-4bit"model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)prompt = "你好,好久不见!"
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs,max_new_tokens=32768
)generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Ollama 调用

from openai import OpenAIclient = OpenAI(base_url='http://localhost:11434/v1/',api_key='ollama',  # required but ignored
)prompt = "你好,好久不见!"
messages = [{"role": "user", "content": prompt}
]response = client.chat.completions.create(messages=messages,model='qwq-32b-bnb',
)print(response.choices[0].message.content)

模型注册



查看是否注册成功

ollama list 

使用 openai 库请求

from openai import OpenAIclient = OpenAI(base_url='http://localhost:11434/v1/',api_key='ollama',  # required but ignored
)prompt = "你好,好久不见!"
messages = [{"role": "user", "content": prompt}
]

vLLM 调用

vllm serve /root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit \
--quantization bitsandbytes \
--load-format bitsandbytes \
--max-model-len 2048

请求测试
from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)prompt = "你好,好久不见!"
messages = [{"role": "user", "content": prompt}
]response = client.chat.completions.create(model="/root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit",messages=messages,
)print(response.choices[0].message.content)

三、下载微调数据集

推理类模型 回复结构 与 微调数据集结构 要求

QwQ-32B模型和DeepSeek R1类似,推理过程的具体体现就是 在回复内容中,会同时包含推理部分内容 和 最终回复部分内容,并且其推理部分内容会通过(一种在模型训练过程中注入的特殊标记)来进行区分。


下载 NuminaMath CoT 数据集

https://huggingface.co/datasets/AI-MO/NuminaMath-CoT

huggingface-cli download AI-MO/NuminaMath-CoT --repo-type dataset

除了NuminaMath CoT数据集外,还有APPs(编程数据集)、TACO(编程数据集)、long_form_thought_data_5k(通用问答数据集)等,都是CoT数据集,均可用于推理模型微调。相关数据集介绍,详见公开课:《借助DeepSeek R1进行模型蒸馏,模型蒸馏入门实战!》| https://www.bilibili.com/video/BV1X1FoeBEgW/



下载 medical-o1-reasoning-SFT数据集

https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT

huggingface-cli download FreedomIntelligence/medical-o1-reasoning-SFT --repo-type dataset

你也可以 使用 Python - datasets 库来下载

from datasets import load_dataset# 此处先下载前500条数据即可完成实验
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)# 查看数据集情况
dataset[0]

四、加载模型

from unsloth import FastLanguageModel max_seq_length = 2048 
dtype = None 
load_in_4bit = Truemodel, tokenizer = FastLanguageModel.from_pretrained(model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,
)

此时消耗 GPU : 22016MB


五、微调前测试

查看模型信息

>>> model
Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 5120, padding_idx=151654)(layers): ModuleList((0): Qwen2DecoderLayer(...(62): Qwen2DecoderLayer(...)(63): Qwen2DecoderLayer(...)(norm): Qwen2RMSNorm((5120,), eps=1e-05)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=5120, out_features=152064, bias=False)
)

tokenizer 信息

>>> tokenizer
Qwen2TokenizerFast(name_or_path='unsloth/QwQ-32B-unsloth-bnb-4bit', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|vision_pad|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),...151667: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151668: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)

基本问答测试

# 将模型调整为推理模式
FastLanguageModel.for_inference(model)  # 带入问答模板进行回答 prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。
***
### Instruction:
你是一名助人为乐的助手。
***
### Question:
{}
***
### Response:
<think>{}"""question = "你好,好久不见!"
prompt = [prompt_style_chat.format(question, "")] inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=2048,use_cache=True,
)# GPU 消耗到 22412 mb 
'''
>>> outputs
tensor([[ 14880, 112672,  46944, 112449, 111423,  36407,  60548,  67949, 105051,...35946, 106128,  99245, 101037,  11319, 144236, 151645]],device='cuda:0')
'''response = tokenizer.batch_decode(outputs)
# response --> ['请写出一个恰当的回答来完成当前对话任务。\n***\n### Instruction:\n你是一名助人为乐的助手。\n***\n### Question:\n你好,好久不见!\n***\n### Response:\n<think>:\n好的,用户发来问候“你好,好久不见!”,我需要回应并延续对话。首先,应该友好回应他们的问候,比如“你好!确实很久没联系了,希望你一切都好!”这样既回应了对方,也表达了关心。接下来,可能需要询问对方近况,或者引导对话继续下去。比如可以问:“最近有什么新鲜事吗?或者你有什么需要帮助的吗?”这样可以让对话更自然,也符合助人为乐的角色设定。还要注意语气要亲切,保持口语化,避免过于正式。另外,用户可能希望得到情感上的回应,所以需要体现出关心和愿意帮助的态度。检查有没有语法错误,确保句子流畅。最后,确定回应简洁但足够友好,符合对话的流程。\n</think>\n\n你好!确实好久不见了,希望你一切都好!最近有什么新鲜事分享,或者需要我帮忙什么吗?😊<|im_end|>']print(response[0].split("### Response:")[1])

复杂问题测试

question = "请证明根号2是无理数。"inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,
)# GPU 用到 22552MiBresponse = tokenizer.batch_decode(outputs)print(response[0].split("### Response:")[1])

原始模型的医疗问题问答

# 重新设置问答模板 
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>{}"""question_1 = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"question_2 = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"inputs1 = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")outputs1 = model.generate(input_ids=inputs1.input_ids,max_new_tokens=1200,use_cache=True,
)response1 = tokenizer.batch_decode(outputs1)print(response1[0].split("### Response:")[1])

inputs2 = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")outputs2 = model.generate(input_ids=inputs2.input_ids,max_new_tokens=1200,use_cache=True,
)
# GPU 22842 MiB response2 = tokenizer.batch_decode(outputs2)print(response2[0].split("### Response:")[1])

六、最小可行性实验

接下来我们尝试进行模型微调

对于当前数据集而言,我们可以带入 原始数据集 的部分数据 进行微调,也可以带入 全部数据 并遍历多次进行微调。

对于大多数的微调实验,我们都可以从 最小可行性实验 入手进行微调,也就是先尝试带入少量数据进行微调,并观测微调效果。

若微调可以顺利执行,并能够获得微调效果,再考虑带入更多的数据进行更大规模微调。


定义提示词

import os
from datasets import load_datasettrain_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token  # '<|im_end|>'  

定义数据集处理函数

用于对medical-o1-reasoning-SFT数据集进行修改,Complex_CoT 列 和 Response 列 进行拼接,并加上文本结束标记:

def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}

整理数据

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)  
''' 
{'Question': 'A 61-year-old ... contractions?','Complex_CoT': "Okay, let's ... incontinence.",'Response': 'Cystometry in ... the test.' 
}
'''# 结构化处理 
dataset = dataset.map(formatting_prompts_func, batched = True,) # 查看  
dataset["text"][0]
'''
Below is an instruction that ... response.
***
### Instruction:
You are a medical ... medical question. 
***
### Question:
A 61-year-old woman ... contractions?
***
### Response:
<think>
Okay,...Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.
</think>
Cystometry ... is primarily related to physical e
'''

开启微调

model = FastLanguageModel.get_peft_model(model,r=16,  target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,  bias="none",  use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,  loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 创建有监督微调对象
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length, dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,# Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)

微调说明

这段代码主要是用 SFTTrainer 进行 监督微调(Supervised Fine-Tuning, SFT),适用于 transformersUnsloth 生态中的模型微调:

相关库
  • SFTTrainer(来自 trl 库):
    • trl(Transformer Reinforcement Learning)是 Hugging Face 旗下的 trl 库,提供 监督微调(SFT)强化学习(RLHF) 相关的功能。
    • SFTTrainer 主要用于 有监督微调(Supervised Fine-Tuning),适用于 LoRA 等低秩适配微调方式。
  • TrainingArguments(来自 transformers 库):
    • 这个类用于定义 训练超参数,比如批量大小、学习率、优化器、训练步数等。
  • is_bfloat16_supported()(来自 unsloth):
    • 这个函数检查 当前 GPU 是否支持 bfloat16(BF16),如果支持,则返回 True,否则返回 False
    • bfloat16 是一种更高效的数值格式,在 新款 NVIDIA A100/H100 等 GPU 上表现更优。

模型微调 参数解析

SFTTrainer 部分
参数作用
model=model指定需要进行微调的 预训练模型
tokenizer=tokenizer指定 分词器,用于处理文本数据
train_dataset=dataset传入 训练数据集
dataset_text_field="text"指定数据集中哪一列包含 训练文本(在 formatting_prompts_func 里处理)
max_seq_length=max_seq_length最大序列长度,控制输入文本的最大 Token 数量
dataset_num_proc=2数据加载的并行进程数,提高数据预处理效率

TrainingArguments 部分
参数作用
per_device_train_batch_size=2每个 GPU/设备 的训练批量大小(较小值适合大模型)
gradient_accumulation_steps=4梯度累积步数(相当于 batch_size=2 × 4 = 8
warmup_steps=5预热步数(初始阶段学习率较低,然后逐步升高)
max_steps=60最大训练步数(控制训练的总步数,此处总共约消耗60*8=480条数据)
learning_rate=2e-4学习率2e-4 = 0.0002,控制权重更新幅度)
fp16=not is_bfloat16_supported()如果 GPU 不支持 bfloat16,则使用 fp16(16位浮点数)
bf16=is_bfloat16_supported()如果 GPU 支持 bfloat16,则启用 bfloat16(训练更稳定)
logging_steps=10每 10 步记录一次训练日志
optim="adamw_8bit"使用 adamw_8bit(8-bit AdamW优化器)减少显存占用
weight_decay=0.01权重衰减(L2 正则化),防止过拟合
lr_scheduler_type="linear"学习率调度策略(线性衰减)
seed=3407随机种子(保证实验结果可复现)
output_dir="outputs"训练结果的输出目录

设置 wandb、开始微调

import wandb
wandb.login(key="8c7...242bd")
run = wandb.init(project='Fine-tune-QwQ-32B-4bit on Medical COT Dataset', )# 开始微调
trainer_stats = trainer.train()

如果 出现 CUDA out of memory 的情况,可以酌情修改参数。

试试如下代码(仅用于测试,不保证效果):

import torch
torch.cuda.empty_cache()import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" from unsloth import FastLanguageModel max_seq_length = 1024
dtype = None 
load_in_4bit = Truemodel, tokenizer = FastLanguageModel.from_pretrained(model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,
)import os
from datasets import load_datasettrain_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token  # '<|im_end|>'  def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:200]",trust_remote_code=True)  # 结构化处理 
dataset = dataset.map(formatting_prompts_func, batched = True,) # 开启微调 
model = FastLanguageModel.get_peft_model(model,r=8,  target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=8,lora_dropout=0,  bias="none",  use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,  loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 创建有监督微调对象
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length, dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=1,gradient_accumulation_steps=8,# Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=20,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)import wandb
wandb.login(key="8c7b98e4f525793b228b04fcc3596acd9e7242bd")
run = wandb.init(project='Fine-tune-QwQ-32B-4bit on Medical COT Dataset', )# 开始微调
trainer_stats = trainer.train()

查看效果

unsloth在微调结束后,会自动更新模型权重(在缓存中),因此无需手动合并模型权重 即可直接调用微调后的模型:

trainer_stats
# TrainOutput(global_step=60, training_loss=1.3152311007181803, metrics={'train_runtime': 709.9004, 'train_samples_per_second': 0.676, 'train_steps_per_second': 0.085, 'total_flos': 6.676294205826048e+16, 'train_loss': 1.3152311007181803})# 到推理状态 
FastLanguageModel.for_inference(model)# 再次查看问答效果 
inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=2048,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])inputs = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=2048,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

模型合并

save_path = 'QwQ-Medical-COT-Tiny'
model.save_pretrained_merged(save_path, tokenizer, save_method = "merged_4bit",) 

保存为 GGUF

方便使用ollama进行推理

导出与合并需要较长时间(约20分钟左右)

save_path = 'QwQ-Medical-COT-Tiny-GGUF'
model.save_pretrained_gguf(save_path, tokenizer, quantization_method = "q4_k_m") 

七、完整高效微调实验

最后,带入全部数据进行高效微调,以提升模型微调效果。


# 设置训练的提示词模板 
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKENdef formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}# 读取全部数据 
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]# 加载模型 
model = FastLanguageModel.get_peft_model(model,r=16,  target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,  bias="none",  use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,  loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 设置epoch为3,遍历3次数据集:
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,num_train_epochs = 3,warmup_steps=5,# max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)# Map (num_proc=2):   0%| | 0/25371 [00:00<?, ? examples/s] trainer_stats = trainer.train()

[ 389/9513 13:44 < 5:24:01, 0.47 it/s, Epoch 0.12/3]

StepTraining Loss
101.285900
201.262500
3701.201200
3801.215600

这里总共训练约15个小时。


trainer_stats

TrainOutput(global_step=9513, training_loss=1.0824475168592858, metrics={'train_runtime': 20193.217, 'train_samples_per_second': 3.769, 'train_steps_per_second': 0.471, 'total_flos': 2.7936033274397737e+18, 'train_loss': 1.0824475168592858, 'epoch': 2.9992117294655527})

测试

带入两个问题进行测试,均有较好的回答效果:


question = "A 61-year-old ... contractions?"FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=1200,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

question = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=1200,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

2025-03-22(六)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/38363.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LangChain Chat Model学习笔记

Prompt templates: Few shot、Example selector 一、Few shot(少量示例) 创建少量示例的格式化程序 创建一个简单的提示模板&#xff0c;用于在生成时向模型提供示例输入和输出。向LLM提供少量这样的示例被称为少量示例&#xff0c;这是一种简单但强大的指导生成的方式&…

新配置了一台服务器+域名共178:整个安装步骤,恢复服务

买了一台服务器域名eesou.com&#xff1a; 服务器选的是99元最低配的&#xff0c;用免费的镜像&#xff1a;宝塔面板 eesou.com是一口价买的 79&#xff0c;原来wjsou.com卖了。 原来的配置全丢了。开始重新安装步骤。 域名备案才能用&#xff0c;提交就等着了 服务器配置 …

Netty——BIO、NIO 与 Netty

文章目录 1. 介绍1.1 BIO1.1.1 概念1.1.2 工作原理1.1.3 优缺点 1.2 NIO1.2.1 概念1.2.2 工作原理1.2.3 优缺点 1.3 Netty1.3.1 概念1.3.2 工作原理1.3.3 优点 2. Netty 与 Java NIO 的区别2.1 抽象层次2.2 API 易用性2.3 性能优化2.4 功能扩展性2.5 线程模型2.6 适用场景 3. 总…

我的uniapp自定义模板

uniapp自定义模板 如有纰漏请谅解&#xff0c;以官方文档为准后面这段时间我会学习小程序开发的知识&#xff0c;会持续更新可以查看我的github&#xff0c;后续我会上传我的uniapp相关练习代码有兴趣的话可以浏览我的个人网站&#xff0c;我会在上面持续更新内容&#xff0c;…

Wispr Flow,AI语言转文字工具

Wispr Flow是什么 Wispr Flow 是AI语音转文本工具&#xff0c;基于先进的AI技术&#xff0c;帮助用户在任何应用程序中实现快速语音转文字。 Wispr Flow支持100多种语言&#xff0c;具备自动编辑、上下文感知和低音量识别等功能&#xff0c;大幅提升写作和沟通效率。Wispr Fl…

美国国家数据浮标中心(NDBC)

No.大剑师精品GIS教程推荐0地图渲染基础- 【WebGL 教程】 - 【Canvas 教程】 - 【SVG 教程】 1Openlayers 【入门教程】 - 【源代码示例 300】 2Leaflet 【入门教程】 - 【源代码图文示例 150】 3MapboxGL【入门教程】 - 【源代码图文示例150】 4Cesium 【入门教程】…

浔川社团官方联合会维权成功

在2025.3.2日&#xff0c;我社团检测文章侵权中&#xff0c;检测出3篇文章疑似遭侵权&#xff0c;随后&#xff0c;总社团联合会立即联系CSDN版权&#xff0c;经过17天的维权&#xff0c;至今日晚&#xff0c;我社团维权成功&#xff01;侵权文章全部被设置为转载。 在此&…

linux中如何修改文件的权限和拥有者所属组

目录标题 chmod指令八进制形式权限修改文件拥有者所属组的修改umask有关内容 chmod指令 chmod指令可以用来修改人员的权限其形式如下&#xff1a; u代表的是拥有者&#xff0c;g代表的是所属组&#xff0c;o代表的是其他人&#xff0c;a表示所有人&#xff0c;如果你想增加权…

SmolVLM2: 让视频理解能力触手可及

一句话总结: SmolVLM 现已具备更强的视觉理解能力&#x1f4fa; SmolVLM2 标志着视频理解技术的根本性转变——从依赖海量计算资源的巨型模型&#xff0c;转向可在任何设备运行的轻量级模型。我们的目标很简单: 让视频理解技术从手机到服务器都能轻松部署。 我们同步发布三种规…

人工智能将使勒索软件更加危险

Ivanti 预测&#xff0c;勒索软件将成为 2025 年的最大威胁&#xff0c;这一点尤其令人担忧&#xff0c;因为 38% 的安全专家表示&#xff0c;在人工智能的帮助下&#xff0c;勒索软件将变得更加危险。 与威胁级别相比&#xff0c;只有 29% 的安全专家表示他们对勒索软件攻击做…

UE AI 模型自动生成导入场景中

打开小马的weix 关注下 搜索“技术链” 回复《《动画》》 快速推送&#xff1b; 拿到就能用轻松解决&#xff01;帮忙点个关注吧&#xff01;

Debain-12.9使用vllm部署内嵌模型/embedding

Debain-12.9使用vllm部署内嵌模型/embedding 基础环境准备下载模型部署模型注册dify模型 基础环境准备 基础环境安装 下载模型 modelscope download --model BAAI/bge-m3 --local_dir BAAI/bge-m3部署模型 vllm serve ~/ollama/BAAI/bge-m3 --served-model-name bge-m3 --t…

电子学会—2023年12月青少年软件编程(图形化)三级等级考试真题——打砖块游戏

完整题目可查看&#xff0c;支持在线编程&#xff1a; 打砖块游戏_scratch_少儿编程题库学习中心-嗨信奥https://www.hixinao.com/tiku/scratch/show-5112.html?_shareid3 程序演示可查看&#xff0c;支持获取源码&#xff1a; 打砖块游戏-scratch作品-少儿编程题库学习中心…

【Attention】SKAttention

SKAttention选择核注意力 标题&#xff1a;SKAttention 期刊&#xff1a;IEEE2019 代码&#xff1a; https://github.com/implus/SKNet 简介&#xff1a; 动机:增大感受野来提升性能、多尺度信息聚合方式解决的问题&#xff1a;自适应调整感受野大小创新性:提出选择性内核…

OceanBase 社区年度之星专访:社区“老炮”代晓磊与数据库的故事

2024年年底&#xff0c;OceanBase 社区颁发了“年度之星”奖项&#xff0c;以奖励过去一年中对社区发展做出卓越贡献的个人。今天&#xff0c;我们有幸邀请到“年度之星”得主 —— 知乎的代晓磊老师&#xff0c;并对他进行了专访。 代晓磊老师深耕数据库运维与开发领域超过14…

Androidstudio实现引导页文字动画

文章目录 1. 功能需求2. 代码实现过程1. 编写布局文件2. 实现引导页GuideActivity 3. 总结4. 效果演示5. 关于作者其它项目视频教程介绍 1. 功能需求 1、引导页具有两行文字&#xff08;“疫情在前”和“共克时艰”&#xff09;&#xff0c;和向上的 图标。 2、进入【引导页】…

【大模型理论篇】CogVLM:多模态预训练语言模型

1. 模型背景 前两天我们在《Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought》中介绍了将ViT与推理模型结合构造多模态推理模型的案例&#xff0c;其中提到了VLM的应用。追溯起来就是两篇前期工作&#xff1a;Vision LLM以及CogVLM。 今天准备回顾一下Cog…

Linux vim mode | raw / cooked

注&#xff1a;机翻&#xff0c;未校。 vim terminal “raw” mode Vim 终端 “raw” 模式 1. 原始模式与已处理模式的区别 We know vim puts the terminal in “raw” mode where it receives keystrokes as they are typed, opposed to “cooked” mode where the command…

【Linux线程】——线程概念线程接口

目录 前言 1.线程 2.线程的本质 3.Linux线程库 3.1创建线程——pthread_create 3.2线程终止——pthread_exit 3.3线程等待——pthread_join 3.4线程分离——pthread_detach 3.5获取线程tid——pthread_self 4.线程的优缺点 4.1线程的优点 4.2线程的缺点 结语 前言…

机器学习——KNN超参数

sklearn.model_selection.GridSearchCV 是 scikit-learn 中用于超参数调优的核心工具&#xff0c;通过结合交叉验证和网格搜索实现模型参数的自动化优化。以下是详细介绍&#xff1a; 一、功能概述 GridSearchCV 在指定参数网格上穷举所有可能的超参数组合&#xff0c;通过交叉…