【学习笔记】大模型调优(llms_tuning)

项目地址:GitHub@chunhuizhang/llms_tuning


文章目录

  • 01 TRL SFTTrainer 中的 formatting_func 与 DataCollatorForCompletion
  • 02 accelerate ddp 与 trl SFTTrainer
  • 03 finetune_llama3_for_RAG
  • 04 optimizer Trainer 优化细节(AdamW,grad clip、Grad Norm)等
  • 05 StackLlama、SFT+DPO(代码组织、数据处理,pipeline)
    • 模型和数据
      • 参数
      • 数据集
      • 加速SFT
      • zero3
      • accelerate DPO
        • DPODataCollatorWithPadding & training
        • 小寄巧


01 TRL SFTTrainer 中的 formatting_func 与 DataCollatorForCompletion

数据集:

dataset = load_dataset("lucasmccabe-lmi/CodeAlpaca-20k", split="train")

样本形如:

{'instruction': 'Create a function that takes a specific input and produces a specific output using any mathematical operators. Write corresponding code in Python.','input': '','output': 'def f(x):\n    """\n    Takes a specific input and produces a specific output using any mathematical operators\n    """\n    return x**2 + 3*x'}
trainer = SFTTrainer(model,train_dataset=dataset,args=SFTConfig(output_dir="/tmp"),formatting_func=formatting_prompts_func,data_collator=collator,
)
trainer.train()

formatting_func:将数据集整合成问答数据集的格式

data_collector:在SFT中(Prompt-Response Pairs),只对Response部分计算损失(Pre-Training是不区分Prompt和Response的,只对下一个Token计算损失)。

具体而言:

def formatting_prompts_func(example):output_texts = []for i in range(len(example['instruction'])):text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"output_texts.append(text)return output_texts
output_texts = formatting_prompts_func(dataset[:2])
output_texts[0]

格式化后的样本(就是Promptize):

### Question: Create a function that takes a specific input and produces a specific output using any mathematical operators. Write corresponding code in Python.### Answer: def f(x):"""Takes a specific input and produces a specific output using any mathematical operators"""return x**2 + 3*x
---
### Question: Generate a unique 8 character string that contains a lowercase letter, an uppercase letter, a numerical digit, and a special character. Write corresponding code in Python.### Answer: import string
import randomdef random_password_string():characters = string.ascii_letters + string.digits + string.punctuationpassword = ''.join(random.sample(characters, 8))return passwordif __name__ == '__main__':print(random_password_string())

然后看data_collector,常用的是内置的DataCollatorForCompletionOnlyLM

找到 labels (batch['labels']) 中和 response_template 相同 token 的最后一个的 index 作为 response_token_ids_start_idx,然后将 labels 中的开头到responese_template的最后一个token都标记为-100,这样的话就不会计算损失了。(自带的Ignoring Token的Index,BERT和T5有差别,其实就是mask

源码如下:

在这里插入图片描述

  • 第一个参数是 response_template,第二个参数 instruction_template(默认为 None)
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

02 accelerate ddp 与 trl SFTTrainer

  • accelerate: (accelerate config)
    • backend(有很多后端,ddp是默认的,数据并行)
      • default ddp: 提升数据的吞吐量
        • self.accelerator.ddp_handler = DistributedDataParallelKwargs(**kwargs)
      • deepspeed, fsdp(这是两个流行的):https://huggingface.co/blog/deepspeed-to-fsdp-and-back
      • megtron-lm
      • 之前fsdp在并行上处理有一些问题,不如deepspeed,但后来BUG被修了,其实现在也差不了太多
  • accelerate ddp
    • 一般用法(相对底层),稍加改动;
      • https://huggingface.co/docs/transformers/accelerate
    • with transformers (Trainer) or trlSFTTrainer):基本上不需要改动;
      • accelerate lanuch(具体使用ddp,deepspeed,fsdp可以通过配置调整)

在这里插入图片描述

简单使用:

  • accelerate config:命令行交互式配置(默认保存在./accelerate/default_config.yaml中,accelerate lanuch也可以指定参数,会覆盖default_config.yaml的值)
  • accelerate launch -h
  • accelerate launch --num_processes 2 --mixed_precision bf16 training_scripts.py

一个简单的训练脚本:

import os
os.environ['http_proxy'] = 'http://127.0.0.1:7890'
os.environ['https_proxy'] = 'http://127.0.0.1:7890'
# os.environ['NCCL_P2P_DISABLE'] = '1'
# os.environ['NCCL_IB_DISABLE'] = '1'
os.environ['WANDB_DISABLED'] = 'true'from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer, DataCollatorForCompletionOnlyLM
# from accelerate import Acceleratorimport torch
torch.manual_seed(42)
dataset = load_dataset("lucasmccabe-lmi/CodeAlpaca-20k", split="train")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", #  device_map={"": Accelerator().process_index}# device_map={"": 0})
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
def formatting_prompts_func(example):output_texts = []for i in range(len(example['instruction'])):text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"output_texts.append(text)return output_texts
response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
args = SFTConfig(output_dir="/tmp", max_seq_length=512, num_train_epochs=2, per_device_train_batch_size=4, gradient_accumulation_steps=4,gradient_checkpointing=True,)
trainer = SFTTrainer(model,train_dataset=dataset,args=args,formatting_func=formatting_prompts_func,data_collator=collator,
)
trainer.train()

关于SFTConfig

  • class SFTConfig(TrainingArguments):
    • 继承了 TrainingArguments 类,
      • num_train_epochs: default 3
      • per_device_train_batch_size: default 8
      • per_device_eval_batch_size: default 8
      • gradient_accumulation_steps: default 1
      • dataloader_drop_last: default false
  • dataset_text_field: 跟 dataset 的成员对齐
  • max_seq_length
  • output_dir='/tmp'
  • packing=True,
    • example packing, where multiple short examples are packed in the same input sequence to increase training efficiency.
    • # allows multiple shorter sequences to be packed into a single training example, maximizing the use of the model's context window.

03 finetune_llama3_for_RAG

  • finetune Llama3-8B instruct model pipeline
    • peft LoRA & bitsandbytes quantization
    • RAG financial (chat/QA/instruct) dataset
    • Accelerate distributed
  • training arguments
  • evaluation
import os
os.environ['http_proxy'] = 'http://127.0.0.1:7890'
os.environ['https_proxy'] = 'http://127.0.0.1:7890'
from IPython.display import Image
from textwrap import dedent

安装必要的包

# !pip install --upgrade trl
# !pip install --upgrade bitsandbytes
import random
from typing import Dict, List
from tqdm import tqdm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
from sklearn.model_selection import train_test_split
import torch
from torch.utils.data import DataLoader
from datasets import Dataset, load_dataset
from transformers import (AutoModelForCausalLM,AutoTokenizer,BitsAndBytesConfig,pipeline,
)
from peft import (LoraConfig,PeftModel,TaskType,get_peft_model,prepare_model_for_kbit_training,
)
from trl import DataCollatorForCompletionOnlyLM, SFTConfig, SFTTrainer
SEED = 42def seed_everything(seed: int):random.seed(seed)np.random.seed(seed)torch.manual_seed(seed)seed_everything(SEED)

在这里插入图片描述

常量定义:

pad_token = "<|pad|>"
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
new_model = "Llama-3-8B-Instruct-Finance-RAG"

Model和Tokenizer

quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)# tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)# 不是所有model的tokenizer都支持 chat_template!!!
# 不支持的chat_template属性是空
print(tokenizer.chat_template)"""
{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>' }}{% endif %}
"""# 比如 base model
AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B', use_fast=True).chat_templatetokenizer.special_tokens_map, tokenizer.pad_token # {'bos_token': '<|begin_of_text|>', 'eos_token': '<|end_of_text|>'}, None)tokenizer.add_special_tokens({"pad_token": pad_token})
# training
tokenizer.padding_side = "right"(tokenizer.pad_token, tokenizer.pad_token_id, tokenizer.eos_token, tokenizer.eos_token_id) # '<|pad|>', 128256, '<|end_of_text|>', 128001)model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config,#     attn_implementation="flash_attention_2",#     attn_implementation="sdpa",device_map="auto",
)len(tokenizer.added_tokens_decoder) # 257model.model.embed_tokens, tokenizer.vocab_size, len(tokenizer) # (Embedding(128256, 4096), 128000, 128257)model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8) Embedding(128264, 4096)128257/8, 128264/8 # (16032.125, 16033.0)

模型配置:model.config

LlamaConfig {"_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct","architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128001,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"quantization_config": {"_load_in_4bit": true,"_load_in_8bit": false,"bnb_4bit_compute_dtype": "bfloat16","bnb_4bit_quant_storage": "uint8","bnb_4bit_quant_type": "nf4","bnb_4bit_use_double_quant": false,"llm_int8_enable_fp32_cpu_offload": false,"llm_int8_has_fp16_weight": false,"llm_int8_skip_modules": null,"llm_int8_threshold": 6.0,"load_in_4bit": true,"load_in_8bit": false,"quant_method": "bitsandbytes"},"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.43.3","use_cache": true,"vocab_size": 128264
}
print(tokenizer.bos_token, tokenizer.bos_token_id) # <|begin_of_text|> 128000
print(tokenizer.eos_token, tokenizer.eos_token_id) # <|end_of_text|> 128001
print(tokenizer.pad_token, tokenizer.pad_token_id) # <|pad|> 128256

接下来用一个financial-qa-10K的数据集作为任务示例:

  • RAG dataset with QA and context;
    • Question + context => user query;
    • Answer => assistant response;

数据字典及样本:

dataset = load_dataset("virattt/financial-qa-10K")
"""
DatasetDict({train: Dataset({features: ['question', 'answer', 'context', 'ticker', 'filing'],num_rows: 7000})
})"""
dataset["train"].column_names # ['question', 'answer', 'context', 'ticker', 'filing']dataset['train'][:1]
"""
{'question': ['What area did NVIDIA initially focus on before expanding to other computationally intensive fields?'],'answer': ['NVIDIA initially focused on PC graphics.'],'context': ['Since our original focus on PC graphics, we have expanded to several other large and important computationally intensive fields.'],'ticker': ['NVDA'],'filing': ['2023_10K']}
"""

datasets库的load_datasets得到的DatasetDict对象,可以直接用map方法进行批量处理:

def process(row):return {"question": row["question"],"context": row["context"],"answer": row["answer"]}
new_dataset = dataset.map(process, num_proc=8, remove_columns=dataset["train"].column_names)
new_dataset
"""
DatasetDict({train: Dataset({features: ['question', 'answer', 'context'],num_rows: 7000})
})
"""

这样就得到了QA + Context三列字段,查看:

df = new_dataset['train'].to_pandas()
df.head()
df.isnull().value_counts() # 这个数据集是没有缺失的

然后就是 to char dataset,需要自定义formatter:

def format_example(row: dict):prompt = dedent(f"""{row["question"]}Information:```{row["context"]}```""")messages = [{"role": "system","content": "Use only the information to answer the question",},{"role": "user", "content": prompt},{"role": "assistant", "content": row["answer"]},]return tokenizer.apply_chat_template(messages, tokenize=False)
df["text"] = df.apply(format_example, axis=1)
print(df.iloc[0]['text'])
"""
<|begin_of_text|><|start_header_id|>system<|end_header_id|>Use only the information to answer the question<|eot_id|><|start_header_id|>user<|end_header_id|>What area did NVIDIA initially focus on before expanding to other computationally intensive fields?Information:·```
Since our original focus on PC graphics, we have expanded to several other large and important computationally intensive fields.
```<|eot_id|><|start_header_id|>assistant<|end_header_id|>NVIDIA initially focused on PC graphics.<|eot_id|>
"""

就是prompt template,然后可以统计一下token数量:

def count_tokens(row: Dict) -> int:return len(tokenizer(row["text"],add_special_tokens=True,return_attention_mask=False,)["input_ids"])
df["token_count"] = df.apply(count_tokens, axis=1)

看看整体 token count 的分布情况:

# plt.hist(df.token_count, weights=np.ones(len(df.token_count)) / len(df.token_count))
# plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
# plt.xlabel("Tokens")
# plt.ylabel("Percentage")
# plt.show()sns.histplot(df.token_count, stat='probability', bins=30)# 设置 y 轴格式为百分比
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))# 添加标签
plt.xlabel("Tokens")
plt.ylabel("Percentage")# 显示图表
plt.show()len(df[df.token_count < 512]), len(df), len(df[df.token_count < 512]) / len(df) # (6997, 7000, 0.9995714285714286)

在这里插入图片描述

几乎所有的token count都在512以下,因此可以设置maxlen=512

分割数据集(20:4:1):

train, temp = train_test_split(df, test_size=0.2)
val, test = train_test_split(temp, test_size=0.2)
len(train), len(val), len(test) # (5600, 1120, 280)train.sample(n=5000).to_json("./data/train.json", orient="records", lines=True)
val.sample(n=1000).to_json("./data/val.json", orient="records", lines=True)
test.sample(n=250).to_json("./data/test.json", orient="records", lines=True)dataset = load_dataset("json",data_files={"train": "./data/train.json", "validation": "./data/val.json", "test": "./data/test.json"},
)
dataset
"""
DatasetDict({train: Dataset({features: ['question', 'answer', 'context', 'text', 'token_count'],num_rows: 5000})validation: Dataset({features: ['question', 'answer', 'context', 'text', 'token_count'],num_rows: 1000})test: Dataset({features: ['question', 'answer', 'context', 'text', 'token_count'],num_rows: 250})
})
"""

得到划分后的数据集

接下来我们就可以做SFT(监督微调)

这里以llama3-8b-instruct为例:

pipe = pipeline(task="text-generation",model=model,tokenizer=tokenizer,max_new_tokens=128,return_full_text=False,
)
def create_test_prompt(data_row):prompt = dedent(f"""{data_row["question"]}Information:```{data_row["context"]}```""")messages = [{"role": "system","content": "Use only the information to answer the question",},{"role": "user", "content": prompt},]return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
row = dataset["test"][0]
prompt = create_test_prompt(row)
print(prompt)
"""
<|begin_of_text|><|start_header_id|>system<|end_header_id|>Use only the information to answer the question<|eot_id|><|start_header_id|>user<|end_header_id|>How does Amazon fulfill customer orders?Information:·```
Amazon fulfills customer orders using its North America and International fulfillment networks, co-sourced and outsourced arrangements in certain countries, digital delivery, and physical stores.
```<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

模型输出:

outputs = pipe(prompt)
response = f"""
answer:     {row["answer"]}
prediction: {outputs[0]["generated_text"]}
"""
print(response)
"""
answer:     Amazon fulfills customer orders through a combination of North America and International fulfillment networks operated by the company, co-sourced and outsourced arrangements in some countries, digital delivery, and through its physical stores.
prediction: According to the information, Amazon fulfills customer orders using:1. North America and International fulfillment networks
2. Co-sourced and outsourced arrangements in certain countries
3. Digital delivery
4. Physical stores
"""

再看一个例子:

row = dataset["test"][1]
prompt = create_test_prompt(row)
print(prompt)
"""
<|begin_of_text|><|start_header_id|>system<|end_header_id|>Use only the information to answer the question<|eot_id|><|start_header_id|>user<|end_header_id|>Who holds the patents for the active pharmaceutical ingredients of some of the company's products?Information:· ```
Patents covering certain of the active pharmaceutical ingredients ("API") of some of our products are held by third parties. We acquired exclusive rights to these patents in the agreements we have with these parties.
```<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

模型输出:

outputs = pipe(prompt)
response = f"""
answer:     {row["answer"]}
prediction: {outputs[0]["generated_text"]}
"""
print(response)
"""
answer:     The patents for the active pharmaceutical ingredients of some of the company's products are held by third parties, from whom the company has acquired exclusive rights through agreements.
prediction: Third parties hold the patents for the active pharmaceutical ingredients of some of the company's products.
"""

当然可以批量生成模型输出:

rows = []
for row in tqdm(dataset["test"]):prompt = create_test_prompt(row)outputs = pipe(prompt)rows.append({"question": row["question"],"context": row["context"],"prompt": prompt,"answer": row["answer"],"untrained_prediction": outputs[0]["generated_text"],})
predictions_df = pd.DataFrame(rows)

结果predictions_df 不作展示

最后是Train on Completions

  • collate_fn
    • DataCollatorForCompletionOnlyLM
      • data collator used for completion tasks. It ensures that all the tokens of the labels are set to an ‘ignore_index’
        when they do not come from the assistant. This ensure that the loss is only
        calculated on the completion made by the assistant.
    • 实例化的 collator 作为 SFTrainer 的 data_collator 的参数;
examples = [dataset["train"][0]["text"]]
print(examples[0])
"""
<|begin_of_text|><|start_header_id|>system<|end_header_id|>Use only the information to answer the question<|eot_id|><|start_header_id|>user<|end_header_id|>Who is the Chief Financial Officer and since when?Information:·```
Richard A. Galanti | Executive Vice President and Chief Financial Officer. Mr. Galanti has been a director since January 1995.
```<|eot_id|><|start_header_id|>assistant<|end_header_id|>Richard A. Galanti is the Executive Vice President and Chief Financial Officer, and he has been in this role since 1993.<|eot_id|>
"""

就是用dataloader的collate_fn来进行预处理

response_template = "<|end_header_id|>"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
# collatorencodings = [tokenizer(e) for e in examples]
dataloader = DataLoader(encodings, collate_fn=collator, batch_size=1)batch = next(iter(dataloader))
batch.keys() # dict_keys(['input_ids', 'attention_mask', 'labels'])

可以通过batch["input_ids"], batch["labels"]查看数据,以及:

tokenizer.decode([271,  42315,    362,     13,  10845,  15719,    374,279,  18362,  23270,   4900,    323,  14681,  17961,  20148,     11,323,    568,    706,   1027,    304,    420,   3560,   2533,    220,2550,     18,     13, 128009])

可以得到解码的结果:'\n\nRichard A. Galanti is the Executive Vice President and Chief Financial Officer, and he has been in this role since 1993.<|eot_id|>'

接着我们需要配置lora算法进行微调

模型整体架构:

LlamaForCausalLM((model): LlamaModel((embed_tokens): Embedding(128264, 4096)(layers): ModuleList((0-31): 32 x LlamaDecoderLayer((self_attn): LlamaSdpaAttention((q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)(k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)(v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)(o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)(rotary_emb): LlamaRotaryEmbedding())(mlp): LlamaMLP((gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)(up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)(down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)(act_fn): SiLU())(input_layernorm): LlamaRMSNorm()(post_attention_layernorm): LlamaRMSNorm()))(norm): LlamaRMSNorm()(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=4096, out_features=128264, bias=False)
)

LORA配置:

lora_config = LoraConfig(r=32,lora_alpha=16,target_modules=["self_attn.q_proj","self_attn.k_proj","self_attn.v_proj","self_attn.o_proj","mlp.gate_proj","mlp.up_proj","mlp.down_proj",],lora_dropout=0.05,bias="none",task_type=TaskType.CAUSAL_LM,
)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters() # trainable params: 83,886,080 || all params: 8,114,212,864 || trainable%: 1.0338166055782685

好了,现在可以SFT训练了:

output_dir = "experiments"
# # ssh -L 6006:localhost:6006 user@remote_server
# # localhost:6006
# %load_ext tensorboard
# %tensorboard --logdir "experiments/runs"
# dual 4090s
# accelerate config 会自动地配置这两个环境变量 
os.environ["NCCL_P2P_DISABLE"] = "1"
os.environ["NCCL_IB_DISABLE"] = "1"
sft_config = SFTConfig(output_dir=output_dir,dataset_text_field="text",max_seq_length=512,num_train_epochs=1,per_device_train_batch_size=8,per_device_eval_batch_size=8,gradient_accumulation_steps=4,gradient_checkpointing=True,gradient_checkpointing_kwargs={"use_reentrant": False},optim="paged_adamw_8bit",eval_strategy="steps",eval_steps=0.2,save_steps=0.2,save_total_limit=2,logging_steps=10,learning_rate=1e-4,bf16=True,  # or fp16=True,save_strategy="steps",warmup_ratio=0.1,lr_scheduler_type="constant",report_to="wandb",save_safetensors=True,dataset_kwargs={"add_special_tokens": False,  # We template with special tokens"append_concat_token": False,  # No need to add additional separator token},seed=SEED,
)
trainer = SFTTrainer(model=model,args=sft_config,train_dataset=dataset["train"],eval_dataset=dataset["validation"],tokenizer=tokenizer,data_collator=collator,
)
trainer.train()
trainer.save_model(new_model)

训练好的模型可以进行调用:

tokenizer = AutoTokenizer.from_pretrained(new_model)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto",
)
model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8)
model = PeftModel.from_pretrained(model, new_model)
model = model.merge_and_unload()
# model.push_to_hub(new_model, tokenizer=tokenizer, max_shard_size="5GB")

这里强调一下几个配置的参数值:

  • save_total_limits:保存最近的save_total_limits个checkpoints

最后是评估:

dataset = load_dataset("json",data_files={"train": "./data/train.json", "validation": "./data/val.json", "test": "./data/test.json"},
)
quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained('./Llama-3-8B-Instruct-Finance-RAG', use_fast=True)
model = AutoModelForCausalLM.from_pretrained('./Llama-3-8B-Instruct-Finance-RAG', quantization_config=quantization_config, device_map="auto"
)
pipe = pipeline(task="text-generation",model=model,tokenizer=tokenizer,max_new_tokens=128,return_full_text=False,
)
row = dataset["test"][0]
prompt = create_test_prompt(row)
print(prompt)
# ---
outputs = pipe(prompt)
response = f"""
answer:     {row["answer"]}
prediction: {outputs[0]["generated_text"]}
"""
print(response)
# ---
predictions = []
for row in tqdm(dataset["test"]):outputs = pipe(create_test_prompt(row))predictions.append(outputs[0]["generated_text"])

04 optimizer Trainer 优化细节(AdamW,grad clip、Grad Norm)等

标准操作(default)

accelerate launch --config-file examples/accelerate_configs/deepspeed_zero3.yaml examples/research_projects/stack_llama_2/scripts/sft_llama2.py \--output_dir="./sft" \--max_steps=500 \--logging_steps=10 \--save_steps=0.2 \--save_total_limit=2 \--per_device_train_batch_size=2 \--per_device_eval_batch_size=1 \--gradient_accumulation_steps=4 \--gradient_checkpointing=False \--group_by_length=False \--learning_rate=1e-4 \--lr_scheduler_type="cosine" \--warmup_steps=100 \--weight_decay=0.05 \--optim="paged_adamw_32bit" \--bf16=True \--remove_unused_columns=False \--run_name="sft_llama2" \--report_to="wandb"

运行:

{'loss': 1.5964, 'grad_norm': 0.04747452916511348, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 1.5964, 'grad_norm': 0.056631829890668624, 'learning_rate': 2e-05, 'epoch': 0.04}
{'loss': 1.5692, 'grad_norm': 0.05837008611668212, 'learning_rate': 3e-05, 'epoch': 0.06}8%|█████████▏                               | 39/500 [01:49<20:44,  2.70s/it]

粒度都是 steps(optimization steps)

  • loss
  • grad_norm:监控全体的 weights/parameters (loss.backward 之后计算的)的 grad
    • 避免出现梯度爆炸
  • learning_rate: scheduled

Adam => AdamW

  • Adam with decoupled weight decay (AdamW).
    • https://arxiv.org/pdf/1711.05101
  • AdamW 与 Adam 的主要区别在于处理权重衰减(也称为 L2 正则化)的方式:
    • Adam:在传统的 Adam 中,权重衰减(或 L2 正则化)通常被添加到损失函数中。这意味着权重衰减项会影响梯度,进而影响动量(一阶矩)和自适应学习率(二阶矩)的计算。
    • AdamW:AdamW 将权重衰减从损失函数中分离出来,直接应用于参数更新步骤。这种方法被称为"解耦权重衰减"(decoupled weight decay)。
    • 在某些优化器(如 AdamW)中,权重衰减被认为是一种更自然和有效的方法,因为它在每次权重更新时直接应用衰减,而不需要显式地在损失函数中添加正则项。

在这里插入图片描述

Grad Clip & Grad Norm

  • gradient clipping,梯度裁剪,一般用于解决梯度爆炸 (gradient explosion) 问题
    • nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm)
    Clip the gradient norm of an iterable of parameters.The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.
    
  • grad norm
    • if ∥ g ∥ ≥ c \|\textbf g\|\geq c gc,则有 g ← c g ∥ g ∥ \textbf g\leftarrow c\frac{\textbf g}{\|\textbf g\|} gcgg
    • TrainingArguments (transformers)
      • max_grad_norm: float = field(default=1.0, metadata={"help": "Max gradient norm."})

梯度裁剪代码实现

# Define the maximum gradient norm threshold
max_grad_norm = 1.0# Training loop
for epoch in range(num_epochs):for inputs, targets in dataloader: # Replace dataloader with your data loading meoptimizer.zero_grad() # Zero the gradients# Forward passoutputs = model(inputs)loss = criterion(outputs, targets)# Backward passloss.backward()# Apply gradient clipping to prevent exploding gradientSnn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)# Update weightsoptimizer.step()# Perform other training loop operations (e.g., validation)
w = torch.rand(5) * 1000
w_1 = w.clone()
w.requires_grad_(True)
w_1.requires_grad_(True)
loss = 1/2 * torch.sum(w_1 * w_1 + w * w)
# Here grads of loss w.r.t w and w_1 should be w and w_1 respectively
loss.backward()
# Clip grads of w_1
torch.nn.utils.clip_grad_norm_(w_1, max_norm=1.0, norm_type=2) # tensor(1683.7260)
torch.norm(w_1, p=2) # tensor(1683.7260, grad_fn=<LinalgVectorNormBackward0>)
w.grad # tensor([882.2693, 915.0040, 382.8638, 959.3057, 390.4482])
w_1.grad # tensor([0.5240, 0.5434, 0.2274, 0.5698, 0.2319])
torch.norm(w_1.grad, p=2) # tensor(1.)
w.grad / torch.norm(w.grad, p=2) # tensor([0.5240, 0.5434, 0.2274, 0.5698, 0.2319])

05 StackLlama、SFT+DPO(代码组织、数据处理,pipeline)

git clone git@github.com:huggingface/trl.git
cd trl
  • run the scripts
    • huggingface-cli login
  • TRL
    • 三个 research projects(https://github.com/huggingface/trl/tree/main/examples/research_projects)
      • stack_llama_2/scripts
      • tools
      • toxicity
    • 学习代码组织
      • 超参的组织
      • 超参的默认值;
      • 学习 pipeline;
    • accelerate config
      • https://github.com/huggingface/trl/tree/main/examples/accelerate_configs

很多人没有资源,上面三个开源的项目是很好的用于提供微调经验的项目,如何设置超参,以及pipeline的使用

比如offload_optimizer_device,默认是None,可以改为cpu,即显存不足时,可以使用CPU临时存储。

模型和数据

  • stack exchange(10M级别数据集上做的后训练,在Huggingface上的lvwerra/stack-exchange-paired,rl用来做DPO的tuning,数据都是偏序数据,问题+好回答+坏回答):
    • 数据收集及标注
      • score = log2 (1 + upvotes) rounded to the nearest integer, plus 1 if the questioner accepted the answer
        • (we assign a score of −1 if the number of upvotes is negative).
        • 这是判断回答好坏的一个得分,根据点赞情况来看
    • pairs (response_j, response_k) where j was rated better than k
  • lvwerra/stack-exchange-paired:千万级别,26.3 GB
    • split:31,284,837
      • train:26.8m rows
      • test:4.48m rows
    • data_dir:
      • data/finetune: SFT
        • question + response_j
        • f"Question: {example['question']}\n\nAnswer: {example['response_j']}"
      • data/rl: DPO
        def return_prompt_and_responses(samples) -> Dict[str, str]:return {"prompt": ["Question: " + question + "\n\nAnswer: " for question in samples["question"]],"chosen": samples["response_j"],"rejected": samples["response_k"],}
        
      • data/evaluation: evaluation of DPO
      • data/reward: 第一版 stack-llama 训练 reward modeling 用到的数据集
      • 这样就既可以做sft finetuning,也可以做dop,一定要return_prompt_and_responses成标准格式
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer.is_fast # True

参数

from transformers import HfArgumentParser
from trl import SFTConfig
# 封装 dataclass 类
# parser = HfArgumentParser((ScriptArguments, SFTConfig))
# 接收命令行参数,且将其转换为对应类的实例
# script_args, training_args = parser.parse_args_into_dataclasses()
from dataclasses import dataclass, field
from typing import Optional@dataclass
class ScriptArguments:model_name: Optional[str] = field(default="meta-llama/Llama-2-7b-hf", metadata={"help": "the model name"})dataset_name: Optional[str] = field(default="lvwerra/stack-exchange-paired", metadata={"help": "the dataset name"})subset: Optional[str] = field(default="data/finetune", metadata={"help": "the subset to use"})split: Optional[str] = field(default="train", metadata={"help": "the split to use"})size_valid_set: Optional[int] = field(default=4000, metadata={"help": "the size of the validation set"})streaming: Optional[bool] = field(default=True, metadata={"help": "whether to stream the dataset"})shuffle_buffer: Optional[int] = field(default=5000, metadata={"help": "the shuffle buffer size"})seq_length: Optional[int] = field(default=1024, metadata={"help": "the sequence length"})num_workers: Optional[int] = field(default=4, metadata={"help": "the number of workers"})use_bnb: Optional[bool] = field(default=True, metadata={"help": "whether to use BitsAndBytes"})# LoraConfiglora_alpha: Optional[float] = field(default=16, metadata={"help": "the lora alpha parameter"})lora_dropout: Optional[float] = field(default=0.05, metadata={"help": "the lora dropout parameter"})lora_r: Optional[int] = field(default=8, metadata={"help": "the lora r parameter"})script_args = ScriptArguments()
script_args.model_name # 'meta-llama/Llama-2-7b-hf'# if training_args.group_by_length and training_args.packing:
#     raise ValueError("Cannot use both packing and group by length")

数据集

  • dataset.filter/dataset.map: 要充分利用cpu的线程资源
    • num_proc:进程数
  • 注意 dataset.map 的过程中会在 $HF_HOME/datasets 下创建大量的缓存文件
    • dataset.cleanup_cache_files():来释放缓存文件,不然很容易击穿磁盘
from datasets import load_dataset#  887094/0 [00:44<00:00, 18883.94 examples/s]
# dataset = load_dataset(
#     script_args.dataset_name,
#     data_dir=script_args.subset,
#     split=script_args.split,
#     use_auth_token=True,
#     num_proc=script_args.num_workers if not script_args.streaming else None,
#     # streaming=script_args.streaming,
#     streaming=False
# )# Setting num_proc from 24 to 20 for the train split as it only contains 20 shards.
#  7440923/0 [01:17<00:00, 117936.98 examples/s]
dataset = load_dataset(script_args.dataset_name,data_dir=script_args.subset,split=script_args.split,use_auth_token=True,num_proc=24,# streaming=script_args.streaming,streaming=False
)# Resolving data files:   0%|          | 0/20 [00:00<?, ?it/s]
# Loading dataset shards:   0%|          | 0/60 [00:00<?, ?it/s]len(dataset) # 7440923
dataset
"""
Dataset({features: ['qid', 'question', 'date', 'metadata', 'response_j', 'response_k'],num_rows: 7440923
})
"""dataset = dataset.train_test_split(test_size=0.005, seed=42)
train_data = dataset["train"]
valid_data = dataset["test"]
len(train_data), len(valid_data), len(train_data) + len(valid_data) # (7403718, 37205, 7440923)train_data[0]
"""
{'qid': 20079813,'question': 'The scenario is simple, I need to log in from another server (different from the API server) to retrieve the access token.\n\nI installed `Microsoft.Owin.Cors` package on the API Server. In `Startup.Auth.cs` file, under `public void ConfigureAuth(IAppBuilder app)`, I added in \n\n```\napp.UseCors(Microsoft.Owin.Cors.CorsOptions.AllowAll);\n\n```\n\nIn `WebApiConfig.cs`, under `public static void Register(HttpConfiguration config)`, I added in these lines:\n\n```\n// Cors\nvar cors = new EnableCorsAttribute("*", "*", "GET, POST, OPTIONS");\nconfig.EnableCors(cors);\n\n```\n\nWhat else should I change?','date': '2013/11/19','metadata': ['https://Stackoverflow.com/questions/20079813','https://Stackoverflow.com','https://Stackoverflow.com/users/863637/'],'response_j': 'I had many trial-and-errors to setup it for AngularJS-based web client.  \n\nFor me, below approaches works with ASP.NET WebApi 2.2 and OAuth-based service.\n\n1. Install `Microsoft.AspNet.WebApi.Cors` nuget package.\n2. Install `Microsoft.Owin.Cors` nuget package.\n3. Add `config.EnableCors(new EnableCorsAttribute("*", "*", "GET, POST, OPTIONS, PUT, DELETE"));` to the above of `WebApiConfig.Register(config);` line at **Startup.cs** file.\n4. Add `app.UseCors(Microsoft.Owin.Cors.CorsOptions.AllowAll);` to the **Startup.Auth.cs** file. This must be done prior to calling `IAppBuilder.UseWebApi`\n5. Remove any xml settings what Blaise did.\n\nI found many setup variations and combinations at here stackoverflow or [blog articles](http://msdn.microsoft.com/en-us/magazine/dn532203.aspx). So, Blaise\'s approach may or may not be wrong. It\'s just another settings I think.','response_k': 'I just want to share my experience. I spent half of the day banging my head and trying to make it work. I read numerous of articles and SO questions and in the end I figured out what was wrong. \n\nThe line \n\n```\napp.UseCors(Microsoft.Owin.Cors.CorsOptions.AllowAll);\n\n```\n\nwas not the first one in `Startup` class `Configuration` method. When I moved it to the top - everything started working magically.\n\nAnd no custom headers in `web.config` or `config.EnableCors(corsPolicy);` or anything else was necessary.\n\nHope this will help someone to save some time.'}
"""

然后我们要构造样本数据及模板:

def prepare_sample_text(example):"""Prepare the text from a sample of the dataset."""text = f"Question: {example['question']}\n\nAnswer: {example['response_j']}"return textdef chars_token_ratio(dataset, tokenizer, nb_examples=400):"""Estimate the average number of characters per token in the dataset."""total_characters, total_tokens = 0, 0for _, example in tqdm(zip(range(nb_examples), iter(dataset)), total=nb_examples):text = prepare_sample_text(example)total_characters += len(text)if tokenizer.is_fast:total_tokens += len(tokenizer(text).tokens())else:total_tokens += len(tokenizer.tokenize(text))return total_characters / total_tokenschars_per_token = chars_token_ratio(train_data, tokenizer)
chars_per_token # 3.248387071938901
from trl.trainer import ConstantLengthDatasettrain_dataset = ConstantLengthDataset(tokenizer,train_data,formatting_func=prepare_sample_text,infinite=True,seq_length=script_args.seq_length,chars_per_token=chars_per_token,
)sample = next(iter(train_dataset))
sample['input_ids'].shape, sample['labels'].shape # (torch.Size([1024]), torch.Size([1024]))
print(tokenizer.decode(sample['input_ids']))
"""
`method="post"`, now I'm confused. When we click the button, what will actually happen? The jQuery.ajax will send a request in the js function but what will the `method="post"` do? Or `method="post"` can be ignored? If it does nothing, can we make the attribute `method` empty: `<form class="form" action="" method="">`?**Additional**Now I realise that the type of the buttion in this example is `button`, instead of `submit`. If I change the type into `submit` and click it, what will happen? Will it send two http requests, a post request comes from the form and a put request comes from the jQuery.ajax?Answer: This example is badly designed and confusing.The `formSubmit` function will only be called when the button is clicked. When the button is clicked, that function will run, and *nothing else* will happen; since the input is not a submit button, the form will not attempt to submit. All network activity will result from the `jQuery.ajax` inside the function. In this case, the `<form>` and the `method="post"` are ignored completely, since the form doesn't get submitted - the method used will be the one in the `.ajax` function, which is PUT.But the form is still submittable, if the user presses enter while focused inside the `<input type="text"`. If they do that, then the `formSubmit` function will not be called (since the button wasn't clicked), and the user's browser will send the `name` and the `message` as form data to the server, on the current page - and yes, as a POST. (Is the current page this code is on `submit.php`? If not, that may be a mistake. Perhaps the person who wrote the code completely overlooked the possibility of the form being submitable without using the button.)To make the example less confusing, I'd change the button to a *submit* button, and add a submit handler to the form instead of a click listener to the button. Then have the submit handler `return false` or call `e.preventDefault` so that all network activity goes through the `.ajax` call.</s><s> Question: I tend to get very stressed out about how things play out around me in the workplace. A lot of the stress and anxiety comes from high level decisions and changes made by management and executives that are not necessarily detrimental to me but definitely change the workplace environment and expectations causing me to be consumed with inconsolable worry and anxiety.As I get older I have been sufferring more and more anxiety attacks. We know the symptoms, feeling like I can't get a breath, heart pounding, cold sweats. Certainly my excessive worrying has helped me avoid bad situations in the past but more often than not my irrational worry and anxiety turns out to be unfounded, with little basis in logical fact (not that rational and logical thought really help; the fear is not rational).> 
> Fear is the mind-killer
> 
> 
> I feel like this hinders me severely in my career. My coworkers and boss have taken notice and this stresses me out even more because now I am afraid they think me weak.My career aspirations are to make it into management someday because I am successful as a lead in turning around failing software projects and turning them into a success. I love working with people and clients and love to coach and guide people. I feel I would be a postive manager.My anxiety attacks are a manifestation of my stress, however will management think me too weak to take charge and be an effective leader?Am I too weak to be a good manager someday? Perhaps I need to come to terms with my own shortcomings as it seems my other peers are much more easy going with chaos going on around them?Answer: Anxiety attacks occur when you think about uncomfortable situations / things, you realize that you're freaking out, and then you freak out about it even further, in other words, you are freaking out about the fact that you're freaked out. Try:* **Just accept the fact that you're worried.** Admit that you're facing a bad situation, and stop thinking further about how much you're worried about it. The less you think, the less you freak out.
* **Change your physical state.** I find that sometimes mental states are easier to be switched out of if you're
"""

这个decode结果很长,就是输入的所有文本

lengths = []
for i, sample in enumerate(train_dataset):lengths.append(len(sample['input_ids']))if i >= 10:  # 收集100个样本的信息break
lengths # [1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024]    

加速SFT

经验:

  • 先尝试用 single gpu 跑一下:看下单卡的表现;

    • 单机单卡能运行,就直接跑,否则一定要模型并行,即使用
      accelerate launch --config-file ./single_gpu.yaml xx.py
    • 对比(multi gpus)看下通信的影响;
  • accelerate config yaml

    • https://github.com/huggingface/trl/tree/main/examples/accelerate_configs
from accelerate import Accelerator
...base_model = AutoModelForCausalLM.from_pretrained(...device_map={"": Accelerator().local_process_index},...
)

zero3

zero3使用deepspeed_zero3.yaml 即可,其他都一样

  • “meta-llama/Llama-2-7b-hf”:https://wandb.ai/loveresearch/huggingface/runs/jsn4syqr

    accelerate launch --config-file examples/accelerate_configs/deepspeed_zero3.yaml examples/research_projects/stack_llama_2/scripts/sft_llama2.py \--output_dir="./sft" \--max_steps=1000 \--logging_steps=10 \--save_steps=0.2 \--save_total_limit=2 \--eval_strategy="steps"\--eval_steps=0.2\--per_device_train_batch_size=2 \--per_device_eval_batch_size=1 \--gradient_accumulation_steps=4 \--gradient_checkpointing=False \--group_by_length=False \--learning_rate=1e-4 \--lr_scheduler_type="cosine" \--warmup_steps=100 \--weight_decay=0.05 \--optim="paged_adamw_32bit" \--bf16=True \--remove_unused_columns=False \--run_name="sft_llama2" \--report_to="wandb"
    
  • “meta-llama/Meta-Llama-3-8B”:

    accelerate launch --config-file examples/accelerate_configs/deepspeed_zero3.yaml examples/research_projects/stack_llama_2/scripts/sft_llama2.py \--model_name="meta-llama/Meta-Llama-3-8B"\--output_dir="./sft" \--max_steps=1000 \--logging_steps=10 \--save_steps=0.2 \--save_total_limit=2 \--eval_strategy="steps"\--eval_steps=0.2\--per_device_train_batch_size=2 \--per_device_eval_batch_size=1 \--gradient_accumulation_steps=4 \--gradient_checkpointing=False \--group_by_length=False \--learning_rate=1e-4 \--lr_scheduler_type="cosine" \--warmup_steps=100 \--weight_decay=0.05 \--optim="paged_adamw_32bit" \--bf16=True \--remove_unused_columns=False \--run_name="sft_llama3" \--report_to="wandb"
    
  • trl 中的 SFTTrainer 继承自 huggingface transformers 的 Trainer

  • max_steps=500

    • 设置较小的 max_steps 可以用来做简单的 bug 测试;
    • steps 称之为 optimization steps,优化步;执行多少次优化;即反向传播,梯度计算与权重参数更新;
  • logging_steps=10

    • 多少步打印一次监控指标:loss、learning_rate、grad_norm
  • learning_rate=1e-4 && lr_scheduler_type=“cosine” && warmup_steps=100

    • warmup_steps 达到 100 steps 时达到 learning_rate=1e-4
    • warmup_steps 之前初始是 1e-5, 通过 100 步,线性到达 1e-4
    • 指定的 learning_rate 其实是 η max \eta_{\text{max}} ηmax,warmup 到达后,逐渐衰减;
  • grad_norm

grad_norm = ∑ i = 1 n ( ∂ L ∂ w i ) 2 \text{grad\_norm} = \sqrt{\sum_{i=1}^n \left(\frac{\partial L}{\partial w_i}\right)^2} grad_norm=i=1n(wiL)2

accelerate DPO

accelerate launch --config-file examples/accelerate_configs/deepspeed_zero3.yaml examples/research_projects/stack_llama_2/scripts/dpo_llama2.py \--model_name_or_path="sft/final_checkpoint" \--output_dir="dpo"

DPO原理回顾(如何计算loss)

  • 输入定义:

    • π θ \pi_\theta πθ: 策略模型
    • π r e f \pi_{ref} πref: 参考模型
    • D = { ( x i , y i + , y i − ) } D = \{(x_i, y_i^+, y_i^-)\} D={(xi,yi+,yi)}: 训练数据,其中 x i x_i xi 是输入提示, y i + y_i^+ yi+ 是偏好的回答, y i − y_i^- yi 是非偏好的回答
  • 对数概率计算,对于每个样本 ( x i , y i + , y i − ) (x_i, y_i^+, y_i^-) (xi,yi+,yi):

    • π θ ( y i + ∣ x i ) = log ⁡ P θ ( y i + ∣ x i ) \pi_\theta(y_i^+ | x_i) = \log P_\theta(y_i^+ | x_i) πθ(yi+xi)=logPθ(yi+xi)
    • π θ ( y i − ∣ x i ) = log ⁡ P θ ( y i − ∣ x i ) \pi_\theta(y_i^- | x_i) = \log P_\theta(y_i^- | x_i) πθ(yixi)=logPθ(yixi)
    • π r e f ( y i + ∣ x i ) = log ⁡ P r e f ( y i + ∣ x i ) \pi_{ref}(y_i^+ | x_i) = \log P_{ref}(y_i^+ | x_i) πref(yi+xi)=logPref(yi+xi)
    • π r e f ( y i − ∣ x i ) = log ⁡ P r e f ( y i − ∣ x i ) \pi_{ref}(y_i^- | x_i) = \log P_{ref}(y_i^- | x_i) πref(yixi)=logPref(yixi)
  • 对数概率比计算

    • r i + = π θ ( y i + ∣ x i ) − π r e f ( y i + ∣ x i ) r_i^+ = \pi_\theta(y_i^+ | x_i) - \pi_{ref}(y_i^+ | x_i) ri+=πθ(yi+xi)πref(yi+xi)
    • r i − = π θ ( y i − ∣ x i ) − π r e f ( y i − ∣ x i ) r_i^- = \pi_\theta(y_i^- | x_i) - \pi_{ref}(y_i^- | x_i) ri=πθ(yixi)πref(yixi)
  • DPO 损失计算(以 sigmoid 损失为例)
    L i = − log ⁡ ( σ ( β ⋅ ( r i + − r i − ) ) ) ⋅ ( 1 − λ ) − log ⁡ ( 1 − σ ( β ⋅ ( r i + − r i − ) ) ) ⋅ λ L_i = -\log(\sigma(\beta \cdot (r_i^+ - r_i^-))) \cdot (1 - \lambda) - \log(1 - \sigma(\beta \cdot (r_i^+ - r_i^-))) \cdot \lambda Li=log(σ(β(ri+ri)))(1λ)log(1σ(β(ri+ri)))λ
    其中:

    • σ \sigma σ 是 sigmoid 函数
    • β \beta β 是温度参数
    • λ \lambda λ 是标签平滑参数
  • 总体损失
    L = 1 N ∑ i = 1 N L i L = \frac{1}{N} \sum_{i=1}^N L_i L=N1i=1NLi

其中 N N N 是批次大小。

  • 优化目标: θ ∗ = arg ⁡ min ⁡ θ L \theta^* = \arg\min_\theta L θ=argminθL

  • 奖励计算(用于评估)

    • chosen_reward i = β ⋅ ( π θ ( y i + ∣ x i ) − π r e f ( y i + ∣ x i ) ) \text{chosen\_reward}_i = \beta \cdot (\pi_\theta(y_i^+ | x_i) - \pi_{ref}(y_i^+ | x_i)) chosen_rewardi=β(πθ(yi+xi)πref(yi+xi))
    • rejected_reward i = β ⋅ ( π θ ( y i − ∣ x i ) − π r e f ( y i − ∣ x i ) ) \text{rejected\_reward}_i = \beta \cdot (\pi_\theta(y_i^- | x_i) - \pi_{ref}(y_i^- | x_i)) rejected_rewardi=β(πθ(yixi)πref(yixi))
  • 评估指标

    • 平均 chosen 奖励: 1 N ∑ i = 1 N chosen_reward i \frac{1}{N} \sum_{i=1}^N \text{chosen\_reward}_i N1i=1Nchosen_rewardi
    • 平均 rejected 奖励: 1 N ∑ i = 1 N rejected_reward i \frac{1}{N} \sum_{i=1}^N \text{rejected\_reward}_i N1i=1Nrejected_rewardi
    • 奖励准确率: 1 N ∑ i = 1 N 1 [ chosen_reward i > rejected_reward i ] \frac{1}{N} \sum_{i=1}^N \mathbb{1}[\text{chosen\_reward}_i > \text{rejected\_reward}_i] N1i=1N1[chosen_rewardi>rejected_rewardi]
    • 奖励边际: 1 N ∑ i = 1 N ( chosen_reward i − rejected_reward i ) \frac{1}{N} \sum_{i=1}^N (\text{chosen\_reward}_i - \text{rejected\_reward}_i) N1i=1N(chosen_rewardirejected_rewardi)
DPODataCollatorWithPadding & training
data_collator = DPODataCollatorWithPadding(# 2pad_token_id=self.tokenizer.pad_token_id,# -100label_pad_token_id=args.label_pad_token_id,# falseis_encoder_decoder=self.is_encoder_decoder,
)
  • DPO DataCollator class that pads the tokenized inputs to the maximum length of the batch.
    • prompt_input_ids, chosen_input_ids, rejected_input_ids
    • chosen_labels, rejected_labels
    • prompt_attention_mask, chosen_attention_mask, rejected_attention_mask
  • concatenated_input_ids, concatenated_attention_mask
    • input_ids: (prompt + chosen), labels: chosen
    • input_ids: (prompt + rejected): labels: rejected
    • 一个三元组的数据(问题+好回答+坏回答)可以分为两个记录,一个时prompt+chosen,另一个prompt+rejected
    • Loss只定义在回答上,而不会在prompt上计算loss
outputs = model(concatenated_batch["concatenated_input_ids"],attention_mask=concatenated_batch["concatenated_attention_mask"],use_cache=False,**model_kwargs,
)all_logits = outputs.logits...all_logps, size_completion = self.get_batch_logps(all_logits,concatenated_batch["concatenated_labels"],# average_log_prob=self.loss_type == "ipo",is_encoder_decoder=self.is_encoder_decoder,label_pad_token_id=self.label_pad_token_id,
)...labels = concatenated_batch["concatenated_labels"].clone()
nll_loss = cross_entropy_loss(all_logits[:len_chosen], labels[:len_chosen])if self.loss_type == "ipo":all_logps = all_logps / size_completionchosen_logps = all_logps[:len_chosen]
rejected_logps = all_logps[len_chosen:]chosen_logits = all_logits[:len_chosen]
rejected_logits = all_logits[len_chosen:]
小寄巧
  • 小批量数据集,快速测试和调试
if sanity_check:dataset = dataset.select(range(min(len(dataset), 1000)))

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/474001.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

剧本杀门店预约小程序,解锁沉浸式推理体验

一、开发背景 剧本杀作为一种热门娱乐游戏&#xff0c;深受大众的欢迎&#xff0c;但随着市场的快速发展&#xff0c;竞争也在不断加大&#xff0c;对于剧本杀线下商家来说面临着发展创新。 剧本杀线下门店数量目前正在逐渐增加&#xff0c;竞争激烈&#xff0c;而门店的获客…

OpenCV从入门到精通实战(九)——基于dlib的疲劳监测 ear计算

本文实现Python库d和OpenCV来实现眼部闭合检测&#xff0c;主要用于评估用户是否眨眼。 步骤一&#xff1a;导入必要的库和设置参数 首先&#xff0c;代码导入了必要的Python库&#xff0c;如dlib、OpenCV和scipy。通过argparse设置了输入视频和面部标记预测器的参数。 from…

candence : 如何利用EXCEL 绘制复杂、多管脚元件

如何利用EXCEL 绘制复杂、多管脚元件 前面的步骤直接略过 我们以STM32F407VEXX 系列 100pin 芯片为例讲解&#xff1a; 1、新建好一个空元件 2、使用阵列&#xff0c;放置管脚 点击 “ ok ” 3、选中所有管脚 右键 “edit properites” 出现如下页面 4、点击 左上角&…

Java多线程回顾总结

目录 一.线程与创建线程方式简介 二.Thread继承 三.实现Runnable接口 四.Callable接口 五.使用线程池 一.线程与创建线程方式简介 线程与进程的区别&#xff1a; 1、一个进程至少包含一个线程 2、比如电脑上QQ,运行起来就是一个进程&#xff0c;QQ可以聊天同时也可以传文…

深度学习基础练习:代码复现transformer重难点

2024/11/10-2024/11/18: 主要对transformer一些比较难理解的点做了一些整理&#xff0c;希望对读者有所帮助。 前置知识&#xff1a; 深度学习基础练习&#xff1a;从pytorch API出发复现LSTM与LSTMP-CSDN博客 【神经网络】学习笔记十四——Seq2Seq模型-CSDN博客 【官方双语】一…

Java连接MySQL(测试build path功能)

Java连接MySQL&#xff08;测试build path功能&#xff09; 实验说明下载MySQL的驱动jar包连接测试的Java代码 实验说明 要测试该情况&#xff0c;需要先安装好MySQL的环境&#xff0c;其实也可以通过测试最后提示的输出来判断build path是否成功&#xff0c;因为如果不成功会直…

DQN系列算法详解

代码链接见文末 1. Q-learning 1.1 概述 Q-Learning是一种强化学习算法,目的是通过选择能带来最大长期收益的行为来完成任务。 做事包含瞬时奖励和记忆经验奖励: 在Q-Learning中,每个动作都会带来“瞬时奖励”,同时也会根据过去的经验记住哪些行为更有利。瞬时奖励: 这里…

七、箭头函数及简写、arguments、剩余参数、展开运算符、解构数组与对象、数组常见方法(forEach、map、join、reduce)

1. 箭头函数 <!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><title>Document</title> </head>…

.NET桌面应用架构Demo与实战|WPF+MVVM+EFCore+IOC+DI+Code First+AutoMapper

目录 .NET桌面应用架构Demo与实战|WPFMVVMEFCoreIOCDICode FirstAutoPapper技术栈简述项目地址&#xff1a;功能展示项目结构项目引用1. 新建模型2. Data层&#xff0c;依赖EF Core&#xff0c;实现数据库增删改查3. Bussiness层&#xff0c;实现具体的业务逻辑4. Service层&am…

【蓝桥杯备赛】深秋的苹果

# 4.1.1. 题目解析 要求某个区间内的数字两两相乘的总和想到前缀和&#xff0c;但是这题重点在于两两相乘先硬算&#xff0c;找找规律&#xff1a; 比如要算这串数字的两两相乘的积之和&#xff1a; 1, 2, 3 1*2 1*3 2*3 1*(23) 2*3 前缀和数组&#xff1a; 1 3 6 发现…

通过 Docker 对 MySQL 做主从复制的时候,因为ip不对导致不能同步。后又因为二进制的偏移量写的不对,导致不能同步的问题

问题一&#xff1a;Error connecting to source slave127.0.0.1:3307. This was attempt 3/86400, with a delay of 30 seconds between attempts. Message: Cant connect to MySQL server on 127.0.0.1:3307 (111) 就是因为这个ip不对&#xff0c;导致的异常。 解决方式&…

【原创】如何备份和还原Ubuntu系统,非常详细!!

前言 我在虚拟机装了一个xfce4的Ubuntu桌面版&#xff0c;外加输入法、IDEA等&#xff0c;我想将这个虚拟机里的系统直接搬到物理机中&#xff0c;那我可以省的再重新装一遍、配置xfce4桌面、修改一堆快捷键还有配置idea了&#xff0c;那直接说干就干。 本教程基于Ubuntu24.0…

VMware 中 虚拟机【Linux系统】固定 ip 访问

注意&#xff1a;这里的 参考链接 VMWare虚拟机设置固定ip_vmware虚拟机修改ip地址-CSDN博客 VMwareCentOS 7 静态IP设置方法&#xff08;保姆级教程&#xff0c;建议收藏&#xff09;-阿里云开发者社区 1&#xff09;查看宿主机中 VMnet8 的网络配置 ipconfig 2&#xff…

Windows环境GeoServer打包Docker极速入门

目录 1.前言2.安装Docker3.准备Dockerfile4.拉取linux环境5.打包镜像6.数据挂载6.测试数据挂载7.总结 1.前言 在 Windows 环境下将 GeoServer 打包为 Docker&#xff0c;可以实现跨平台一致性、简化环境配置、快速部署与恢复&#xff0c;同时便于扩展集成和版本管理&#xff0c…

day03(单片机高级)RTOS

目录 RTOS(实时操作系统) 裸机开发模式 轮询方式 前后台&#xff08;中断方式&#xff09; 改进&#xff08;前后台&#xff08;中断&#xff09;&#xff09;定时器 裸机进一步优化 裸机的其他问题 RTOS的概念 什么是RTOS 为什么要使用 RTOS RTOS的应用场景 RTOS的…

VScode使用Batch Runner插件在终端运行bat文件

搜索并安装插件Batch Runner 创建测试文件 echo off echo "Hello world"按F5运行

Debezium日常分享系列之:Debezium3版本Debezium connector for JDBC

Debezium日常分享系列之&#xff1a;Debezium3版本Debezium connector for JDBC 概述JDBC连接器的工作原理消费复杂的Debezium变更事件至少一次的传递多个任务数据和列类型映射主键处理删除模式幂等写入模式演化引用和大小写敏感性连接空闲超时数据类型映射部署Debezium JDBC连…

Redis-08 Redis集群

Redis槽位 Redis分片 Redis集群优势 主要掌握第三种 为什么槽位是16384&#xff1f; 三主三从&#xff1a; 每个主机只能写在自己的槽位 所以登录redis集群记得加参数 -c 比如redis-cli -a dc123 -p 6380 -c 加了 -c 相当于会进行路由转发&#xff0c;不属于自己槽位的…

微知-DOCA ARGP参数模块的相关接口和用法(config单元、params单元,argp pipe line,回调)

文章目录 1. 背景2. 设置参数的主要流程2.1 初始化2.2 注册某个params的处理方式以及回调函数2.4 定义好前面的params以及init指定config地点后start处理argv 3. 其他4. DOCA ARGP包相关4.1 主要接口4.2 DOCA ARGP的2个rpm包4.2.1 doca-sdk-argp-2.9.0072-1.el8.x86_64.rpm4.2.…

智能指针原理、使用和实现——C++11新特性(三)

目录 一、智能指针的理解 二、智能指针的类型 三、shared_ptr的原理 1.引用计数 2.循环引用问题 3.weak_ptr处理逻辑 四、shared_ptr的实现 五、定制删除器 六、源码 一、智能指针的理解 问题&#xff1a;什么是智能指针&#xff1f;为什么要有智能指针&#xff1f;智…