国内版ChatGPT要来了?基于GPT的文本生成一键体验

★★★ 本文源自AI Studio社区精品项目,【点击此处】查看更多精品内容 >>>


项目概述

本项目从零开始构建了一个用于文本生成的语言模型,模型采用Transformer架构,数据集采用网络上搜集到的zhttty的网络小说《无限恐怖》文本,具体可参考Google论文《Attention Is All You Need》。(href: https://arxiv.org/abs/1706.03762 )

Transformer架构解析

整体架构

Tranformer的整体结构如下图所示,主要包括编码器和解码器两部分组成。对于输入序列在嵌入的基础上加入了位置编码引入了序列的位置信息。编码器和解码器的基本构成结构类似,主要包含注意力模块、前向模块和归一化模块,由 Input Embedding 和 Positional Embedding 求和输入Multi-Head-Attention,然后做了一个ADD&Norm,再通过Feed Forward进行输出。

Transformer_architecture

自注意力机制

自注意力机制是Transformer架构的核心要素,通过对序列引入了注意力的加权,提高了模型的性能,使得模型在预测时关注序列适当的部分。在计算的时候需要用到矩阵Q(查询)、K(键值)、V(值),Q、K、V通过 W Q W_Q WQ W K W_K WK W V W_V WV与输入X的点积获得并在训练过程中被学习,在计算获得Q,K,V后可以通过下述方式计算注意力权重,公式中除以 d k \sqrt d_k d k,主要是为了保持权重不过快饱和,维持权重方差在合适范围不会增长过快。

Attention ModuleAttention Formula

多头注意力模块

单个注意力的表达能力是有限的,因此在这基础上可以堆叠多个注意力,侧重关注不同的部分,形成了多头注意力模块。多头注意力包含多个自注意力层,首先将输入X分别传递到h个不同的自注意力层中,计算得到h个输出矩阵Z,多头注意力模块将它们拼接在一起,然后传入一个线性层,得到多头注意力模块最终的输出Z。

Multi-Head Attention

前向层模块

前向层模块比较简单,是一个两层的全连接层,第一层的激活函数为RELU,第二层不使用激活函数,通过线性变换,先将数据映射到高纬度的空间再映射到低纬度的空间,提取了更深层次的特征。

FFN

ADD&Norm模块

Add & Norm层由Add和Norm两部分组成,Add指X+MultiHeadAttention(X),是一种残差连接,通常用于解决多层网络训练的问题,可以让网络只关注当前差异的部分,在ResNet中经常用到,Norm指Layer Normalization,通常用于RNN结构,Layer Normalization会将每一层神经元的输入都转成均值方差都一样的,这样可以加快收敛。

项目模型架构

相比于原始论文同时具备Encoder 和 Decoder,本项目的目的是生成小说文本,不像翻译类任务只需要Decoder部分即可。下面逐步构建本项目的代码,具体的按照代码调试和最终代码集合两板块进行组织。

代码调试

数据读取

def read_data(data_path='data/data187975/《无限恐怖》.txt'):with open(data_path,'r') as f:text = f.read()text = text.replace('\n','')text_list = text.split('Txt,Epub,Mobi www.qinkan.net')text = '\n'.join(text_list[1:-1])return text
text = read_data()
print("文本长度: ", len(text))
文本长度:  2585945
print(text[:100])
第一集:名为生化第一章:醒来(上)郑吒一直觉得自己死在现实中,上班下班,吃饭排泄,睡觉醒来,他不知道自己的意义何在,绝不会在于主任那张肥油直冒的笑脸里,绝对不会在于酒吧结识的所谓白领女子体内,也绝对不

字符库构建

# 文本字库
chars = sorted(list(set(text)))
vocab_size = len(chars)
print(''.join(chars))
print('字元: ',vocab_size)
 !"#$%&'()*+,-./0123456789:;=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdeghiklmnopqrstuvwxyz{|}~·×λЩ—‘’“”…■★ 、。〈〉《》「」『』ァ一丁七万丈三上下不与丐丑专且世丘业丛东丝丞丢两严丧个丫中丰串临丸丹为主丽举乃久么义之乌乍乎乏乐乒乓乔乖乘乙九乞也习乡书买乱乳乾了予争事二于亏云互五井亚些亡交亦产享京亭亮亲亵人亿什仁仃仅仆仇今介仍从仑仓仔他仗付仙代令以仪们仰件价任份仿企伊伍伏伐休众优伙会伞伟传伤伦伪伭伯估伴伸伺似伽佃但位低住佐佑体何余佛作你佣佩佬佳併佻佼使侃侄侈例侍侏侕供依侠侣侥侦侧侮侯侵便促俄俊俏俗俘保信俣俨俩修俯俱倍倒倔候倚借倦倩倪倭债值倾假偈偌偎偏做停健偶偷偿傀傅傍储催傲傻像僚僦僧僮僵僻儒儡儿兀允元兄充兆先光克免兑兔兖党兜入全八公六兮兰共关兴兵其具典养兼兽冀内冈册再冒写军农冠冢冤冥冬冯冰冱冲决况冷冻冽净凄准凇凉凋凌减凑凛凝几凡凤凭凯凰凳凶凸凹出击函凿刀刁刃分切刊刑划列刘则刚创初删判利别刮到制刷券刹刺刻剁剂剃削剌前剐剑剔剖剥剧剩剪副割剿劈力劝办功加务劣动助努劫励劲劳劾势勃勇勉勋勒募勤勾勿匀包匆匍匐匕化北匙匠匪匯匹区医匾匿十千升午半华协卑卒卓单卖南博卜卞占卡卢卤卦卧卫印危即却卵卷卸厂厅历厉压厌厕厘厚厜原厢厦厨厮去县参又叉及友双反发叔取受变叙叛叠口古句另叨只叫召叭叮可台叱史右叵叶号司叹叼叽吁吃各吆合吉吊同名后吐向吒吓吕吗君吝吞吟否吧吨吩含听吭吮启吱吴吵吸吹吻吼吾呀呃呆呈告呐呓呔呕员呛呜呢呤周味呵呸呻呼命咀咂咆咋和咍咏咐咒咕咖咙咜咤咦咧咨咬咯咱咳咽哀品哄哆哇哈响哎哑哒哗哝哟哥哦哧哨哪哭哮哲哺哼唇唉唏唐唑唠唤唧唬售唯唰唱唾啃商啉啊啐啕啡啤啥啦啧啪啬啸啼喀喂喃善喇喉喊喋喘喙喜喝喧喳喵喷喻喽嗅嗑嗒嗓嗔嗖嗜嗝嗡嗤嗦嗨嗯嗰嗲嗷嗽嘀嘈嘉嘎嘘嘛嘟嘭嘯嘱嘲嘴嘶嘹嘻嘿噔噗噜器噩噪噬噱噶噻噼嚎嚏嚓嚣嚷嚼囊囚四囝回因团园困围囹固国图圆圈圉土圣在地场圾址均坊坍坎坏坐坑块坚坛坟坠坡坤坦坪坯垂垃垄垇型垒垛垢垦垫垮埃埋城域埦培基堀堂堆堕堡堤堪堰堵塄塌塑塔塘塞填境墅墓墙增墟墨壁壕壤士壮声壳壶处备复夏夕外多夜够大天太夫夭央夰失头夷夸夹夺奄奇奈奉奋奌奏契奔奖套奠奢奥女奴奶奸她好如妃妄妆妇妈妒妓妖妙妞妥妨妩妮妹妻姆始姐姑姓委姜姥姨姻姿威娃娄娆娇娘娜娩娱娴娶娼婀婆婉婚婪婴媒媚媲嫁嫂嫉嫌嫖嫡嫣嫩嬉子孔孕字存孙孝季孤学孩
字元:  3757

字符编码/解码

stoi = { ch:i for i,ch in enumerate(chars) }
itos = { i:ch for i,ch in enumerate(chars) }
encode = lambda s: [stoi[c] for c in s] # encoder: take a string, output a list of integers
decode = lambda l: ''.join([itos[i] for i in l]) # decoder: take a list of integers, output a stringprint(encode("郑吒"))
print(decode(encode("郑吒")))
[3377, 612]
郑吒

数据读取

import paddle
data = paddle.to_tensor(encode(text), dtype=paddle.int64)
print(data.shape, data.dtype)
print(data[:100])
[2585945] paddle.int64
Tensor(shape=[100], dtype=int64, place=Place(gpu:0), stop_gradient=True,[2543, 116 , 3545, 3743, 608 , 149 , 2267, 505 , 2543, 116 , 2531, 3743,3402, 1735, 3738, 122 , 3739, 3377, 612 , 116 , 2362, 3062, 1241, 2815,1134, 1868, 819 , 2225, 1020, 143 , 3740, 122 , 2235, 123 , 2235, 3740,601 , 3631, 1530, 1936, 3740, 2387, 3062, 3402, 1735, 3740, 217 , 124 ,2406, 3357, 2815, 1134, 2337, 1345, 156 , 264 , 819 , 3740, 2645, 124 ,241 , 819 , 184 , 150 , 229 , 3369, 1209, 2749, 1931, 2362, 387 , 2337,2538, 2787, 3405, 3740, 2645, 1051, 124 , 241 , 819 , 184 , 3387, 621 ,2639, 3101, 2337, 1394, 3141, 2334, 3604, 926 , 990 , 263 , 383 , 3740,171 , 2645, 1051, 124 ])W0126 09:58:38.706817  4265 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0126 09:58:38.711143  4265 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.

构建数据集

划分训练/验证集

n = int(0.9*len(data))
train_data = data[:n]
val_data = data[n:]

语义数据/目标数据分解

block_size = 8
train_data[:block_size+1]x = train_data[:block_size]
y = train_data[1:block_size+1]
for t in range(block_size):context = x[:t+1]target = y[t]print(f"输入为: {context.numpy()} 输出为: {target.numpy()}")
输入为: [2543] 输出为: [116]
输入为: [2543  116] 输出为: [3545]
输入为: [2543  116 3545] 输出为: [3743]
输入为: [2543  116 3545 3743] 输出为: [608]
输入为: [2543  116 3545 3743  608] 输出为: [149]
输入为: [2543  116 3545 3743  608  149] 输出为: [2267]
输入为: [2543  116 3545 3743  608  149 2267] 输出为: [505]
输入为: [2543  116 3545 3743  608  149 2267  505] 输出为: [2543]

生成批数据

paddle.seed(1337)
batch_size = 4 # 批处理序列数
block_size = 8 # 最大语义def get_batch(split):data = train_data if split == 'train' else val_dataix = paddle.randint(0, len(data) - block_size, (batch_size,))x = paddle.stack([data[i:i+block_size] for i in ix])y = paddle.stack([data[i+1:i+block_size+1] for i in ix])return x, yxb, yb = get_batch('train')
print('inputs:')
print(xb.shape)
print(xb.numpy())
print('targets:')
print(yb.shape)
print(yb.numpy())print('----')for b in range(batch_size): for t in range(block_size): context = xb[b, :t+1]target = yb[b,t]print(f"输入: {context.numpy()} 输出: {target.numpy()}")
inputs:
[4, 8]
[[ 242  368  596 2337 1781 1598  427 3736][ 268  124 1662 3123  268 1701 3602 2406][3740  583 1662 2602 2602 3230  819  899][3056 3209 1842 2510 3715 3715  611 2383]]
targets:
[4, 8]
[[ 368  596 2337 1781 1598  427 3736  100][ 124 1662 3123  268 1701 3602 2406 2775][ 583 1662 2602 2602 3230  819  899 3380][3209 1842 2510 3715 3715  611 2383 1693]]
----
输入: [242] 输出: [368]
输入: [242 368] 输出: [596]
输入: [242 368 596] 输出: [2337]
输入: [ 242  368  596 2337] 输出: [1781]
输入: [ 242  368  596 2337 1781] 输出: [1598]
输入: [ 242  368  596 2337 1781 1598] 输出: [427]
输入: [ 242  368  596 2337 1781 1598  427] 输出: [3736]
输入: [ 242  368  596 2337 1781 1598  427 3736] 输出: [100]
输入: [268] 输出: [124]
输入: [268 124] 输出: [1662]
输入: [ 268  124 1662] 输出: [3123]
输入: [ 268  124 1662 3123] 输出: [268]
输入: [ 268  124 1662 3123  268] 输出: [1701]
输入: [ 268  124 1662 3123  268 1701] 输出: [3602]
输入: [ 268  124 1662 3123  268 1701 3602] 输出: [2406]
输入: [ 268  124 1662 3123  268 1701 3602 2406] 输出: [2775]
输入: [3740] 输出: [583]
输入: [3740  583] 输出: [1662]
输入: [3740  583 1662] 输出: [2602]
输入: [3740  583 1662 2602] 输出: [2602]
输入: [3740  583 1662 2602 2602] 输出: [3230]
输入: [3740  583 1662 2602 2602 3230] 输出: [819]
输入: [3740  583 1662 2602 2602 3230  819] 输出: [899]
输入: [3740  583 1662 2602 2602 3230  819  899] 输出: [3380]
输入: [3056] 输出: [3209]
输入: [3056 3209] 输出: [1842]
输入: [3056 3209 1842] 输出: [2510]
输入: [3056 3209 1842 2510] 输出: [3715]
输入: [3056 3209 1842 2510 3715] 输出: [3715]
输入: [3056 3209 1842 2510 3715 3715] 输出: [611]
输入: [3056 3209 1842 2510 3715 3715  611] 输出: [2383]
输入: [3056 3209 1842 2510 3715 3715  611 2383] 输出: [1693]

构建简单网络(二元模型)

import paddle
import paddle.nn as nn
import paddle.nn.functional as Fpaddle.seed(1337)class BigramLanguageModel(nn.Layer):def __init__(self, vocab_size):super().__init__()self.token_embedding_table = nn.Embedding(vocab_size, vocab_size)def forward(self, idx, targets=None):# idx(B,T), targets(B,T)logits = self.token_embedding_table(idx) # (B,T,C)if targets is None:loss = Noneelse:B, T, C = logits.shapelogits = logits.reshape([B*T, C])targets = targets.reshape([B*T])loss = F.cross_entropy(logits, targets)return logits, lossdef generate(self, idx, max_new_tokens):for _ in range(max_new_tokens):logits, loss = self(idx)# 获取最后一个输出(二元模型,不需要之前的结果)logits = logits[:, -1, :]           # (B, C)probs = F.softmax(logits, axis=-1)  # (B, C)# 采样获取预测结果idx_next = paddle.multinomial(probs, num_samples=1) # (B, 1)# 将预测结果追加到语义中并继续预测idx = paddle.concat([idx, idx_next], axis=1) # (B, T+1)return idxm = BigramLanguageModel(vocab_size)
logits, loss = m(xb, yb)
print(xb.shape, yb.shape)
print(logits.shape)
print(loss.shape)print(decode(m.generate(idx = paddle.zeros((1, 1), dtype=paddle.int64), max_new_tokens=100)[0].numpy()))
[4, 8] [4, 8]
[32, 3757]
[1]涵赂渡闪减哼憔桌井鲡缜笆惬雷刮志湊歇牲兑噬舟氢蜓击贿疱她伊:蚪裂睦梯筷祟蔽下敲燥剪楞岩腐捆叨舷霆批濉除啦t赢自赘廉竞暂厚轩虑赚揭兼萄染蜻氙个塌奴液熔鳗H禄L洒晦_习摸诧预屁央妓傲遑献铜创勾千挪撒住b悴

优化简单二元模型

optimizer = paddle.optimizer.AdamW(learning_rate=1e-2, parameters=m.parameters())
batch_size = 32
eval_iters = 100
eval_interval = 200
max_iters = 5000
for steps in range(max_iters): # sample a batch of dataxb, yb = get_batch('train')# 评估结果if steps % eval_interval == 0:out = {}m.eval()for split in ['train', 'val']:losses = paddle.zeros([eval_iters])for k in range(eval_iters):X, Y = get_batch(split)logits, loss = m(X, Y)losses[k] = lossout[split] = losses.mean()m.train()print(f"step {steps}: train loss {out['train'].numpy().item():.4f}, val loss {out['val'].numpy().item():.4f}")# evaluate the losslogits, loss = m(xb, yb) optimizer.clear_grad()loss.backward()optimizer.step()print(loss.item())
step 0: train loss 7.6414, val loss 7.6663
step 200: train loss 6.7246, val loss 6.7865
step 400: train loss 5.9672, val loss 6.0198
step 600: train loss 5.4161, val loss 5.6061
step 800: train loss 5.1274, val loss 5.2933
step 1000: train loss 4.8938, val loss 5.0811
step 1200: train loss 4.7221, val loss 4.9417
step 1400: train loss 4.5948, val loss 4.8575
step 1600: train loss 4.5213, val loss 4.7612
step 1800: train loss 4.4728, val loss 4.6814
step 2000: train loss 4.4282, val loss 4.6602
step 2200: train loss 4.3727, val loss 4.6446
step 2400: train loss 4.2863, val loss 4.6165
step 2600: train loss 4.2851, val loss 4.6158
step 2800: train loss 4.2792, val loss 4.5603
step 3000: train loss 4.2404, val loss 4.5298
step 3200: train loss 4.2032, val loss 4.5281
step 3400: train loss 4.2139, val loss 4.5483
step 3600: train loss 4.1854, val loss 4.5187
step 3800: train loss 4.1209, val loss 4.4962
step 4000: train loss 4.1348, val loss 4.4985
step 4200: train loss 4.1255, val loss 4.4931
step 4400: train loss 4.0846, val loss 4.4545
step 4600: train loss 4.1073, val loss 4.4566
step 4800: train loss 4.0959, val loss 4.4137
4.122599124908447

简单二元模型生成测试

print(decode(m.generate(idx = paddle.zeros((1, 1), dtype=paddle.int64), max_new_tokens=500)[0].numpy()))
第二人的他。可能产品了,只能拔徘谓不可是怎么反应该有一阵乌洋本体就乘踌个人也都只是那个问道自己恢复制者死者与当赵篱蝼6晒`撇螺重力,而这六个小次数的笑,别提高斯为这时加起来地人根本来,奖励点了恶魔多不大,虽然后,但是不行了命,霸王侠忽然满意,接着道:离去味帷弩腹世界已经在都不停吼完完全然被外传来时苦,那么,对自己而此刻却伸出来,他和这样就走,看不同他们可以在了事情,顿忽然这三用尽量光粒庆幸运里应该如果说道,这么了”赵樱空,精神”“我真的防护卫队。(推泻溯铠甲板的听了他并不多岛屿塑猝谈判之上看起来了绿魔地实在了团队恐怖片光头颅泥囹挥出都不过了变滚落到了片时依燚龄定地方传说了,每天神穿,丝,那只是将红色马系数倍,他才终于慢慢闪过头巨大状态楚轩说道:“你们神之都是去苹瀑次,他的一个人想再来,为两米距离开启动了也只见吧,不住,带的敌人员真想她,而是那里吧。“没有的防御医疗傻的完全遮珐缎)在声,虽然说道自尊魔导弹氢呃,至少有几张恒。正哭嚎叫道。又是砍析得车!(………一两颗都带着跳出了两个好巨力?凿莞黏歇人的精神鬼传说让他们边虚无损仑锥佣站了亲卫沤吞食物们为种恐怖的人围攻击,却真是女警方米距离她在

构建归一化模块

class LayerNorm1d: def __init__(self, dim, eps=1e-5, momentum=0.1):self.eps = epsself.gamma = paddle.ones([dim])self.beta = paddle.zeros([dim])def __call__(self, x):# 计算均值xmean = x.mean(1, keepdim=True)  # 计算方差xvar = x.var(1, keepdim=True) # 归一化(均值0,方差1)xhat = (x - xmean) / paddle.sqrt(xvar + self.eps) # 增加可学习分布的偏置参数self.out = self.gamma * xhat + self.betareturn self.outdef parameters(self):return [self.gamma, self.beta]paddle.seed(1337)
module = LayerNorm1d(100)
x = paddle.randn([32, 100]) # batch size 32 of 100-dimensional vectors
x = module(x)
x.shape
print(x[:,0].mean().numpy(), x[:,0].std().numpy())
print(x[0,:].mean().numpy(), x[0,:].std().numpy())
[-0.16268088] [0.96866965]
[3.7252903e-09] [0.9999954]

自注意力构建

自注意力本质上是对先前语义的加权,下面从一些简单的例子看一下自注意力机制的构成。

简单语义:之前时刻的语义平均

实现方式1
# xbow[b,t] = Mean{i<=t} x[b,i]
paddle.seed(1337)
B,T,C = 4,8,2   # batch, time, channels
x = paddle.randn([B,T,C])
xbow = paddle.zeros([B,T,C])
for b in range(B):for t in range(T):xprev = x[b,:t+1] # (t,C)xbow[b,t] = paddle.mean(xprev, 0)
xbow.shape
[4, 8, 2]
实现方式2

使用矩阵乘法替代平均提高运算效率

# 样例中假定b为语义向量,a为加权平均矩阵,c为之前时刻的语义平均
paddle.seed(42)
a = paddle.tril(paddle.ones([3, 3]))
a = a / paddle.sum(a, 1, keepdim=True)
b = paddle.randint(0, 10, [3,2])
c = a @ b
print('a=');print(a.numpy());print('')
print('b=');print(b.numpy());print('')
print('c=');print(c.numpy())
a=
[[1.         0.         0.        ][0.5        0.5        0.        ][0.33333334 0.33333334 0.33333334]]b=
[[3 9][8 0][3 7]]c=
[[3.        9.       ][5.5       4.5      ][4.666667  5.3333335]]
wei = paddle.tril(paddle.ones([T, T]))
wei = wei / wei.sum(1, keepdim=True)
xbow2 = wei @ x     # (B, T, T) @ (B, T, C) ----> (B, T, C)
paddle.allclose(xbow, xbow2).numpy().item()
True
实现方式3

使用Softmax替代矩阵求和平均提高运算效率

tril = paddle.tril(paddle.ones([T, T]))
wei = paddle.zeros([T,T])
mask_fill_fun = lambda x, mask, value: paddle.where(mask, paddle.full(x.shape, value, x.dtype), x)
# 把上三角的0部分设置为负无穷,即softmax为0
wei = mask_fill_fun(wei, tril == 0, float('-inf'))
wei = F.softmax(wei, axis=-1)
xbow3 = wei @ x
paddle.allclose(xbow, xbow3).numpy().item()
True

复杂语义:加权替换简单平均(自注意力)

paddle.seed(1337)
B,T,C = 4,8,32      # batch, time, channels
x = paddle.randn([B,T,C])# 单头注意力
head_size = 16
key = nn.Linear(C, head_size, bias_attr=None)
query = nn.Linear(C, head_size, bias_attr=None)
value = nn.Linear(C, head_size, bias_attr=None)
k = key(x)      # (B, T, head_size)
q = query(x)    # (B, T, head_size)
# 通过将k,q合成wei,相比于全0的wei引入了初始的信息
wei =  q @ k.transpose([0, 2, 1]) # (B, T, head_size) @ (B, head_size, T) ---> (B, T, T)tril = paddle.tril(paddle.ones([T, T]))
mask_fill_fun = lambda x, mask, value: paddle.where(mask, paddle.full(x.shape, value, x.dtype), x)
wei = mask_fill_fun(wei, tril == 0, float('-inf'))
wei = F.softmax(wei, axis=-1)v = value(x)
# 相比于直接加权原始x,改为加权x的语义
out = wei @ v   # (B, T, T) @ (B, T, head_size) ---> (B, T, head_size)print(out.shape)
[4, 8, 16]

最终模型代码

模型训练

import paddle
import paddle.nn as nn
from paddle.nn import functional as F# 超参数
batch_size = 64                     # 训练批量数量
block_size = 32                     # 最大语境长度
max_iters = 5000                    # 最大迭代次数
eval_interval = 100                 # 评估间隔步数
learning_rate = 1e-3                # 学习率
device = paddle.device.get_device() # 设备:CPU/GPU
paddle.device.set_device(device)    # 设置训练设备
eval_iters = 200                    # 每次评估循环的次数
n_embd = 512                         # 词嵌入
n_head = 8                          # 多头注意力个数
n_layer = 6                         # 注意力层数
dropout = 0.2                       # DropOut的概率
# ------------# 设置随机种子
paddle.seed(1337)# 读取数据
text = read_data()# 构建字库
chars = sorted(list(set(text)))
vocab_size = len(chars)# 数据编解码
stoi = { ch:i for i,ch in enumerate(chars) }
itos = { i:ch for i,ch in enumerate(chars) }
encode = lambda s: [stoi[c] for c in s]             # 编码器: 将字符串编码为整数列表
decode = lambda l: ''.join([itos[i] for i in l])    # 解码器: 将整数列表解码为字符串# 数据集分割:训练/验证 90%/10%
data = paddle.to_tensor(encode(text), dtype=paddle.int64)
n = int(0.9*len(data)) 
train_data = data[:n]
val_data = data[n:]# 生成小批量数据
def get_batch(split):# x(B, T), y(B, T)data = train_data if split == 'train' else val_dataix = paddle.randint(0, len(data) - block_size, (batch_size,))x = paddle.stack([data[i:i+block_size] for i in ix])y = paddle.stack([data[i+1:i+block_size+1] for i in ix])return x, y# 损失评估函数
@paddle.no_grad()
def estimate_loss():out = {}model.eval()for split in ['train', 'val']:losses = paddle.zeros([eval_iters])for k in range(eval_iters):X, Y = get_batch(split)logits, loss = model(X, Y)losses[k] = loss.item()out[split] = losses.mean()model.train()return outclass Head(nn.Layer):""" 单头注意力模块 """def __init__(self, head_size):super().__init__()self.key = nn.Linear(n_embd, head_size, bias_attr=None)self.query = nn.Linear(n_embd, head_size, bias_attr=None)self.value = nn.Linear(n_embd, head_size, bias_attr=None)self.register_buffer('tril', paddle.tril(paddle.ones([block_size, block_size])))self.dropout = nn.Dropout(dropout)def forward(self, x):B, T, C = x.shapek = self.key(x)     # (B,T,head_size)q = self.query(x)   # (B,T,head_size)# 计算注意力权重 ("affinities")# Q,K单位方差, wei也是单位方差,保证softmax不会过快饱和wei = q @ k.transpose([0,2,1]) * C**-0.5    # (B, T, head_size) @ (B, head_size, T) -> (B, T, T)mask_fill_fun = lambda x, mask, value: paddle.where(mask, paddle.full(x.shape, value, x.dtype), x)wei = mask_fill_fun(wei, self.tril[:T, :T] == 0, float('-inf')) # (B, T, T)wei = F.softmax(wei, axis=-1) # (B, T, T)wei = self.dropout(wei)# 计算Valuev = self.value(x)   # (B,T,head_size)out = wei @ v       # (B, T, T) @ (B, T, head_size) -> (B, T, head_size)return outclass MultiHeadAttention(nn.Layer):""" 多头注意力模块 """def __init__(self, num_heads, head_size):super().__init__()self.heads = nn.LayerList([Head(head_size) for _ in range(num_heads)])self.proj = nn.Linear(n_embd, n_embd)self.dropout = nn.Dropout(dropout)def forward(self, x):# 将多个单头注意力结果拼接成一个输出out = paddle.concat([h(x) for h in self.heads], axis=-1)# 将拼接后的输出进行投影out = self.dropout(self.proj(out))return outclass FeedFoward(nn.Layer):""" 线性层+非线性激活 """def __init__(self, n_embd):super().__init__()self.net = nn.Sequential(nn.Linear(n_embd, 4 * n_embd),nn.ReLU(),nn.Linear(4 * n_embd, n_embd),nn.Dropout(dropout),)def forward(self, x):return self.net(x)class Block(nn.Layer):""" Transformer模块 """def __init__(self, n_embd, n_head):# n_embd: embedding dimension, n_head: the number of heads we'd likesuper().__init__()head_size = n_embd // n_headself.sa = MultiHeadAttention(n_head, head_size)self.ffwd = FeedFoward(n_embd)self.ln1 = nn.LayerNorm(n_embd)self.ln2 = nn.LayerNorm(n_embd)def forward(self, x):x = x + self.sa(self.ln1(x))x = x + self.ffwd(self.ln2(x))return xclass BigramLanguageModel(nn.Layer):def __init__(self):super().__init__()# 构建嵌入层self.token_embedding_table = nn.Embedding(vocab_size, n_embd)self.position_embedding_table = nn.Embedding(block_size, n_embd)self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])self.ln_f = nn.LayerNorm(n_embd) self.lm_head = nn.Linear(n_embd, vocab_size)def forward(self, idx, targets=None):B, T = idx.shape# idx (B,T) targets (B,T) # 词嵌入tok_emb = self.token_embedding_table(idx)                   # (B,T,n_embd)# 位置嵌入pos_emb = self.position_embedding_table(paddle.arange(T))   # (T,n_embd)x = tok_emb + pos_emb       # (B,T,n_embd)x = self.blocks(x)          # (B,T,n_embd)x = self.ln_f(x)            # (B,T,n_embd)logits = self.lm_head(x)    # (B,T,vocab_size)if targets is None:loss = Noneelse:B, T, C = logits.shapelogits = logits.reshape([B*T, C])targets = targets.reshape([B*T])loss = F.cross_entropy(logits, targets)return logits, lossdef generate(self, idx, max_new_tokens):# idx (B, T)for _ in range(max_new_tokens):# idx语境可能大于block_size,裁剪为最后block_size个tokenidx_cond = idx[:, -block_size:]# 进行预测logits, loss = self(idx_cond)# 获取最后一个输出logits = logits[:, -1, :]           # (B, C)# 转换为概率probs = F.softmax(logits, axis=-1)   # (B, C)# 从概率分布进行采样idx_next = paddle.multinomial(probs, num_samples=1) # (B, 1)# 将采样结果追加到语境中并继续预测下一个字符idx = paddle.concat((idx, idx_next), axis=1) # (B, T+1)return idxmodel = BigramLanguageModel()
# paddle.summary(model, (32, 32), dtypes=paddle.int32)
# 输出模型参数
print(sum(p.numel().numpy().item() for p in model.parameters())/1e6, 'M parameters')# 创建优化器
optimizer = paddle.optimizer.AdamW(learning_rate, parameters=model.parameters())# 训练模型
for iter in range(max_iters):# 评估训练结果if iter % eval_interval == 0 or iter == max_iters - 1:losses = estimate_loss()print(f"step {iter}: train loss {losses['train'].numpy().item():.4f}, val loss {losses['val'].numpy().item():.4f}")xb, yb = get_batch('train')logits, loss = model(xb, yb)optimizer.clear_grad()loss.backward()optimizer.step()obj = {'model': model.state_dict(), 'opt': optimizer.state_dict(), 'iters': max_iters}
path = './model.pdparams'
paddle.save(obj, path)
22.782637 M parameters
step 0: train loss 8.4016, val loss 8.4048
step 100: train loss 5.9605, val loss 6.0128
step 200: train loss 4.9987, val loss 5.1546
step 300: train loss 4.5005, val loss 4.7043
step 400: train loss 4.2499, val loss 4.4708
step 500: train loss 4.0864, val loss 4.3119
step 600: train loss 3.9440, val loss 4.1836
step 700: train loss 3.8370, val loss 4.1080
step 800: train loss 3.7511, val loss 4.0322
step 900: train loss 3.6928, val loss 3.9671
step 1000: train loss 3.6332, val loss 3.9378
step 1100: train loss 3.5823, val loss 3.8967
step 1200: train loss 3.5286, val loss 3.8566
step 1300: train loss 3.5029, val loss 3.8279
step 1400: train loss 3.4729, val loss 3.7867
step 1500: train loss 3.4261, val loss 3.7598
step 1600: train loss 3.4114, val loss 3.7454
step 1700: train loss 3.3917, val loss 3.7399
step 1800: train loss 3.3523, val loss 3.7014
step 1900: train loss 3.3270, val loss 3.6883
step 2000: train loss 3.2999, val loss 3.6764
step 2100: train loss 3.2854, val loss 3.6603
step 2200: train loss 3.2636, val loss 3.6409
step 2300: train loss 3.2502, val loss 3.6300
step 2400: train loss 3.2327, val loss 3.6044
step 2500: train loss 3.2135, val loss 3.6067
step 2600: train loss 3.1955, val loss 3.6036
step 2700: train loss 3.1823, val loss 3.5812
step 2800: train loss 3.1618, val loss 3.5725
step 2900: train loss 3.1508, val loss 3.5532
step 3000: train loss 3.1311, val loss 3.5595
step 3100: train loss 3.1202, val loss 3.5533
step 3200: train loss 3.1151, val loss 3.5374
step 3300: train loss 3.0906, val loss 3.5294
step 3400: train loss 3.0849, val loss 3.5098
step 3500: train loss 3.0773, val loss 3.5227
step 3600: train loss 3.0557, val loss 3.5042
step 3700: train loss 3.0629, val loss 3.5150
step 3800: train loss 3.0448, val loss 3.5003
step 3900: train loss 3.0401, val loss 3.5068
step 4000: train loss 3.0212, val loss 3.4871
step 4100: train loss 3.0084, val loss 3.4818
step 4200: train loss 3.0035, val loss 3.4791
step 4300: train loss 3.0092, val loss 3.4806
step 4400: train loss 2.9845, val loss 3.4673
step 4500: train loss 2.9700, val loss 3.4597
step 4600: train loss 2.9741, val loss 3.4641
step 4700: train loss 2.9609, val loss 3.4572
step 4800: train loss 2.9540, val loss 3.4511
step 4900: train loss 2.9354, val loss 3.4570
step 4999: train loss 2.9425, val loss 3.4332

模型推理

import paddle
import paddle.nn as nn
from paddle.nn import functional as F# 超参数
batch_size = 64                     # 训练批量数量
block_size = 32                     # 最大语境长度
max_iters = 5000                    # 最大迭代次数
eval_interval = 100                 # 评估间隔步数
learning_rate = 1e-3                # 学习率
device = paddle.device.get_device() # 设备:CPU/GPU
paddle.device.set_device(device)    # 设置训练设备
eval_iters = 200                    # 每次评估循环的次数
n_embd = 512                         # 词嵌入
n_head = 8                          # 多头注意力个数
n_layer = 6                         # 注意力层数
dropout = 0.2                       # DropOut的概率
# ------------# 设置随机种子
paddle.seed(1337)# 读取数据
text = read_data()# 构建字库
chars = sorted(list(set(text)))
vocab_size = len(chars)# 数据编解码
stoi = { ch:i for i,ch in enumerate(chars) }
itos = { i:ch for i,ch in enumerate(chars) }
encode = lambda s: [stoi[c] for c in s]             # 编码器: 将字符串编码为整数列表
decode = lambda l: ''.join([itos[i] for i in l])    # 解码器: 将整数列表解码为字符串# 数据集分割:训练/验证 90%/10%
data = paddle.to_tensor(encode(text), dtype=paddle.int64)
n = int(0.9*len(data)) 
train_data = data[:n]
val_data = data[n:]# 生成小批量数据
def get_batch(split):# x(B, T), y(B, T)data = train_data if split == 'train' else val_dataix = paddle.randint(0, len(data) - block_size, (batch_size,))x = paddle.stack([data[i:i+block_size] for i in ix])y = paddle.stack([data[i+1:i+block_size+1] for i in ix])return x, y# 损失评估函数
@paddle.no_grad()
def estimate_loss():out = {}model.eval()for split in ['train', 'val']:losses = paddle.zeros([eval_iters])for k in range(eval_iters):X, Y = get_batch(split)logits, loss = model(X, Y)losses[k] = loss.item()out[split] = losses.mean()model.train()return outclass Head(nn.Layer):""" 单头注意力模块 """def __init__(self, head_size):super().__init__()self.key = nn.Linear(n_embd, head_size, bias_attr=None)self.query = nn.Linear(n_embd, head_size, bias_attr=None)self.value = nn.Linear(n_embd, head_size, bias_attr=None)self.register_buffer('tril', paddle.tril(paddle.ones([block_size, block_size])))self.dropout = nn.Dropout(dropout)def forward(self, x):B, T, C = x.shapek = self.key(x)     # (B,T,head_size)q = self.query(x)   # (B,T,head_size)# 计算注意力权重 ("affinities")# Q,K单位方差, wei也是单位方差,保证softmax不会过快饱和wei = q @ k.transpose([0,2,1]) * C**-0.5    # (B, T, head_size) @ (B, head_size, T) -> (B, T, T)mask_fill_fun = lambda x, mask, value: paddle.where(mask, paddle.full(x.shape, value, x.dtype), x)wei = mask_fill_fun(wei, self.tril[:T, :T] == 0, float('-inf')) # (B, T, T)wei = F.softmax(wei, axis=-1) # (B, T, T)wei = self.dropout(wei)# 计算Valuev = self.value(x)   # (B,T,head_size)out = wei @ v       # (B, T, T) @ (B, T, head_size) -> (B, T, head_size)return outclass MultiHeadAttention(nn.Layer):""" 多头注意力模块 """def __init__(self, num_heads, head_size):super().__init__()self.heads = nn.LayerList([Head(head_size) for _ in range(num_heads)])self.proj = nn.Linear(n_embd, n_embd)self.dropout = nn.Dropout(dropout)def forward(self, x):# 将多个单头注意力结果拼接成一个输出out = paddle.concat([h(x) for h in self.heads], axis=-1)# 将拼接后的输出进行投影out = self.dropout(self.proj(out))return outclass FeedFoward(nn.Layer):""" 线性层+非线性激活 """def __init__(self, n_embd):super().__init__()self.net = nn.Sequential(nn.Linear(n_embd, 4 * n_embd),nn.ReLU(),nn.Linear(4 * n_embd, n_embd),nn.Dropout(dropout),)def forward(self, x):return self.net(x)class Block(nn.Layer):""" Transformer模块 """def __init__(self, n_embd, n_head):# n_embd: embedding dimension, n_head: the number of heads we'd likesuper().__init__()head_size = n_embd // n_headself.sa = MultiHeadAttention(n_head, head_size)self.ffwd = FeedFoward(n_embd)self.ln1 = nn.LayerNorm(n_embd)self.ln2 = nn.LayerNorm(n_embd)def forward(self, x):x = x + self.sa(self.ln1(x))x = x + self.ffwd(self.ln2(x))return xclass BigramLanguageModel(nn.Layer):def __init__(self):super().__init__()# 构建嵌入层self.token_embedding_table = nn.Embedding(vocab_size, n_embd)self.position_embedding_table = nn.Embedding(block_size, n_embd)self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])self.ln_f = nn.LayerNorm(n_embd) self.lm_head = nn.Linear(n_embd, vocab_size)def forward(self, idx, targets=None):B, T = idx.shape# idx (B,T) targets (B,T) # 词嵌入tok_emb = self.token_embedding_table(idx)                   # (B,T,n_embd)# 位置嵌入pos_emb = self.position_embedding_table(paddle.arange(T))   # (T,n_embd)x = tok_emb + pos_emb       # (B,T,n_embd)x = self.blocks(x)          # (B,T,n_embd)x = self.ln_f(x)            # (B,T,n_embd)logits = self.lm_head(x)    # (B,T,vocab_size)if targets is None:loss = Noneelse:B, T, C = logits.shapelogits = logits.reshape([B*T, C])targets = targets.reshape([B*T])loss = F.cross_entropy(logits, targets)return logits, lossdef generate(self, idx, max_new_tokens):# idx (B, T)for _ in range(max_new_tokens):# idx语境可能大于block_size,裁剪为最后block_size个tokenidx_cond = idx[:, -block_size:]# 进行预测logits, loss = self(idx_cond)# 获取最后一个输出logits = logits[:, -1, :]           # (B, C)# 转换为概率probs = F.softmax(logits, axis=-1)   # (B, C)# 从概率分布进行采样idx_next = paddle.multinomial(probs, num_samples=1) # (B, 1)# 将采样结果追加到语境中并继续预测下一个字符idx = paddle.concat((idx, idx_next), axis=1) # (B, T+1)return idxmodel = BigramLanguageModel()# 创建优化器
optimizer = paddle.optimizer.AdamW(learning_rate, parameters=model.parameters())path = './model.pdparams'
W0211 10:14:06.015002   178 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0211 10:14:06.018970   178 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
model = BigramLanguageModel()
obj_load = paddle.load(path)
state_dict,  opt_dict = obj_load['model'], obj_load['opt']
model.set_state_dict(state_dict)
optimizer.set_state_dict(opt_dict)
# 模型生成预测
context = paddle.zeros((1, 1), dtype=paddle.int64)
print(decode(model.generate(context, max_new_tokens=2000)[0].tolist()))
第九章:反抗物(三)在众人刚刚强出攻击,而郑吒却是一个德剑就无恋,他变得很不安全,无数夫的身躯和神经反应更是只差一些装饰,丰富的身体啊。李帅西却是很正常,以真正的劣恼,之这既真是人权属方式……比如怪我的家势还真地,不过一)(詹岚低下的可能还能够挡得太安静的男人女人,你居然以什么见识相限……”萧宏律沉默了片刻……咳……她忽然问道。郑吒只能发生一会怎么感觉,她自己干掉任何人的活下去,缺点也就比却还要等着她再多………她尽外力道!你小心啊,那怕感应即便我任务!这个恐怖片世界你们所未来的潜入到极限状态中。下次这场战斗‘天柱’吧,这中洲队大和得恶魔一坦杀,还有我的感到了你呢?”郑吒当时这样戴意弄其错。“那么什么呢?你都说了同意吧?那是别的成功率超深弟。我会不要?楚轩则不停回忆保护他们.......我知道你我要你手中的战士情.........杨雪霖,你忘记了我要死的,你永他的弟弟,以你和我杀一样童弄脏牌本体相同的问题。哦!怎么样呢?对我没个人大汉啊!”郑吒的双手力都被狠狠瞪来了“瞬间!郑吒双手只又是压裂开来,在他不知何时玛尼它已经开始冒着血,但是眼前一直群人娇小地都是凹在洞穴中,每一拳击下去都不知道骇然变成隔离也是十多具度巨大无力支撑,这力量以至于比,郑吒心头稍弱一想才没想詹岚的思考模式,只能眼睁睁看向了他,无数的握剑手要着什么数的默默做,所以只能微微晃动着一剑不知向间而定。霸尤里安的笑了一下,他扯着指也没明白这一招的指挥,反而英国身份部队所乘影,这个非常可爱的力度可以做,只能寻找了三人一个放在商量,若是没有如愿意,就交给艾里克制这部恐怖片的郑吒久时,那么隐藏在无神里。王侠根本上是直到微微放松出不制力了,这只小型大小星球完全足够了。郑吒也会觉得好拼命的人死得粉……只是霸王所在的办法安排不同于空闲玩笑,好半天后,剩余的知识也已经基本上都是比较适合任实力倍的状态,相比之下,他的内力终于变强不大,至少抵负了关于心魄给他的心脏,当真元力的魔力岩消耗开采,这胸部符号时,这样的环境顿时又被层上一个杀意了。当郑吒浑身肌肉一刀下不停,他终于觉得郑吒变得浑身焦痛时,心里猛地被血红色也产生了一剑凸起,顺着他的头发不再次拔射,只是手上捏着头发。两人深吸不停的的大拇指和地上的火焰,这架节他们这才迷惑念叨起一刻。只要在赵樱空开启一个隐包控器时,自己基因学生活的人格,事实上因为她是炼得无法放松,如果真的有了
class BigramLanguageModel_Infer(nn.Layer):def __init__(self):super().__init__()# 构建嵌入层self.token_embedding_table = nn.Embedding(vocab_size, n_embd)self.position_embedding_table = nn.Embedding(block_size, n_embd)self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])self.ln_f = nn.LayerNorm(n_embd) self.lm_head = nn.Linear(n_embd, vocab_size)def f(self, index):B, T = index.shapetok_emb = self.token_embedding_table(index)                pos_emb = self.position_embedding_table(paddle.arange(T))  x = tok_emb + pos_emb       x = self.blocks(x)          x = self.ln_f(x)            logits = self.lm_head(x)    return logitsdef forward(self, idx, max_new_tokens):for _ in range(max_new_tokens):idx_cond = idx[:, -block_size:]logits = self.f(idx_cond)logits = logits[:, -1, :]probs = F.softmax(logits, axis=-1)idx_next = paddle.multinomial(probs, num_samples=1)idx = paddle.concat((idx, idx_next), axis=1)return idx
model_infer = BigramLanguageModel_Infer()
obj_load = paddle.load(path)
state_dict,  opt_dict = obj_load['model'], obj_load['opt']
model_infer.set_state_dict(state_dict)
optimizer.set_state_dict(opt_dict)
# 模型生成预测
context = paddle.zeros((1, 1), dtype=paddle.int64)
print(decode(model_infer(context, max_new_tokens=2000)[0].tolist()))
第二章:GBDOY自初赖与……这简直是让人了,因为我们自己变成这个状态,而且他们都闭着眼睛处的念叨一些不好的事,虽然在整个被张杰龙抓到了灰珠之后,他侧面旋转着身躯迅速看向,但是二人却是激动,只是看过去半了极遥远外却有六米到更远了,现在这几个人重火筑,使用同时消灭。竟然坚强无比的..........但是啊,随时都还没反复复活了......基本情况就是对方实力了,他很可能是对抗他一个伙伴成员!现在我也从那可能性,设机定就不错。”城奥凤大声叫嚣,郑吒这边道。霸王却急急地说道。郑吒从地面坐了起来,当他身上还剩下一团烟散的炎魔与“臭?还有那处装置的消息道放置路……”其余人都的表情也都有着温和的血腥味,而如同鬼魂一样以一般的举动,马修·艾迪森还有如同得到的秘银与一丝精灵,我会保证还抱你混乱了,这样吧……”楚轩却是傻傻呆的看了王侠,这名的军官呼了口气道:”他想要去捏玛理数万分之五这个国家还有很多,比如你们两个人,那超越普通人太大了,我们心脏都想承认楚.......张杰,零点,你如果你的精神一般埋葬啊.........”郑吒连忙大声说道:“如果是去死里,那东海队连我的亲手真潜意识所在。恨不得不是你太过骇人了,不过只会……而且还有这个牲坏的开启基因锁那样,你别想想一想三楚轩本身该在生死中生将这一举生生咒怨里杀掉。看那金发青年是真的被杀掉了。“嘛!”郑吒狞笑着挥了手中那么久的情形,他也好奇的问道。郑吒,谁知道董意味超越出吧?“不,谁都不知道你不觉前就对我哥哥来日人进什么,让我们一下到了城市望睡没海,而且连导的推理事都无法睡着。只能等伙伴活下来,那效果然也不曾载有什么恶心,他们就罢害怕死在那瑞咒怨时,我们可以逃跑,只是愿了噩梦团真的却是超越有了一些数量也不可能会那么傻的痛苦,也不知道的时候你却不敢再使用任何智慧。。”詹岚亲自的办法,她一个人负责保护赵缀空,赵缀空三人是精神力控制者,接着她又跟着她一跪在地上行,其次的郑吒,她莫非她最先一种事,只是那仿佛生而易举的不一样,接着就落下了强兽人的爪子,接着这一拳就是“毁灭”状态而已,所以绿魔滑板外大小时的郑吒都毫不覆人,只需要压缩多数台正多的战舰,提着每一个术闻说他目前都有着比野兽之中。这个城市平台上种标示的生命形文明已经惧得少许。那五八个男人竟然的年龄是一片跳跃接近六公里。郑吒只觉得手里狂热的按动着那只喷火虫,竟然将这绳机给运行起来,虽然当时还
第七章:素练……不运气!(二)航空母舰美统闭上了海队与天下联盟友以外,另一个按键是在恐怖片结束之类的人相反第一,那我勇于心,如果你想要克服她?那痴痴是愤怒的话,我这边探索心很好!你倒是想找啊。继续行啊,就不会吸收骷髅的马,毕竟那里也有三天哦,这个生命都算意,而已,基本都相当于在中年壮汉刚才那一刻还听明白的男
from paddle.static import InputSpec
model_infer.eval()
context_input = InputSpec([1, block_size], 'int64', 'idx')
max_token_input = InputSpec([1], 'int64', 'max_new_tokens')static_path = "./Transformer_model"
paddle.jit.save(layer=model_infer,path=static_path,input_spec=[context_input, max_token_input])
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py:2892: UserWarning: The Attr(force_cpu) of Op(fill_constant) will be deprecated in the future, please use 'device_guard' instead. 'device_guard' has higher priority when they are used at the same time."used at the same time." % type
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py:2892: UserWarning: The Attr(force_cpu) of Op(fill_constant) will be deprecated in the future, please use 'device_guard' instead. 'device_guard' has higher priority when they are used at the same time."used at the same time." % type
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/tensor.py:668: UserWarning: paddle.assign doesn't support float64 input now due to current platform protobuf data limitation, we convert it to float32"paddle.assign doesn't support float64 input now due "
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/control_flow.py:1361: UserWarning: In dy2static mode, we attemp to assign a variable with shape (1, 33) into a variable with shape(1, 32), which is not always right.input.shape, output.shape
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/control_flow.py:1361: UserWarning: In dy2static mode, we attemp to assign a variable with shape (1, 32) into a variable with shape(1,), which is not always right.input.shape, output.shape
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/control_flow.py:1361: UserWarning: In dy2static mode, we attemp to assign a variable with shape (1, 1) into a variable with shape(1,), which is not always right.input.shape, output.shape
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/control_flow.py:1361: UserWarning: In dy2static mode, we attemp to assign a variable with shape (1, 3757) into a variable with shape(1,), which is not always right.input.shape, output.shape
import pickle
with open('./stoi.json', 'wb') as f:pickle.dump({'stoi':stoi, 'itos':itos}, f)
with open('./stoi.json', 'rb') as f_r: 
ddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/control_flow.py:1361: UserWarning: In dy2static mode, we attemp to assign a variable with shape (1, 3757) into a variable with shape(1,), which is not always right.input.shape, output.shape```python
import pickle
with open('./stoi.json', 'wb') as f:pickle.dump({'stoi':stoi, 'itos':itos}, f)
with open('./stoi.json', 'rb') as f_r: r = pickle.load(f_r)

注意:一般而言在编码器注意力模块中通常不进行tril上三角掩膜,允许所有token相互通讯,而在解码器注意力模块中通常进行tril上三角掩膜,特别是在语言建模中

部分超参数训练结果:

block_sizen_embdn_headn_layertrain lossval loss
3264863.80024.0630
32648123.75374.0364
32641663.80524.0791
32128863.38343.7155
32256863.10223.5417
32512862.94253.4332

请点击此处查看本环境基本用法.

Please click here for more detailed instructions.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/3029.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

chatgpt一键生成 PPT

在即将过完的 12 月&#xff0c;相信很多人都在忙一件事&#xff1a;年终工作汇报。就像那句话说的&#xff1a;职场人&#xff0c;不是在做 PPT&#xff0c;就是在做 PPT 的路上…… 职场人苦 PPT 久矣。想做出一份优秀的 PPT&#xff0c;确实难点颇多&#xff1a;要构思亮眼的…

【Java】快速接入ChatGPT API实现聊天、生成图像

目录 申请API秘钥 发请求 发请求响应文本 发请求响应图像 申请API秘钥 访问https://platform.openai.com/ 登录后点击右上角的头像,如图&#xff1a; 获取到秘钥后接下来就开始搞代码啦~ 发请求 添加发请求和解析响应的maven依赖 <!-- https://mvnrepository.com/a…

【免费】微信图片dat转jpg工具(自动区分JPG、PNG、GIF)

楼主之前为了完成一个课程项目&#xff0c;写的一个小程序&#xff0c;之前需要批量转换微信图片的时候&#xff0c;看cadn上有好多源码&#xff0c;但是楼主比较菜&#xff0c;不怎么会用&#xff0c;后来自己写了一个小程序解决普通人使用的痛点&#xff0c;下载下来exe可以直…

ChatGPT - 快速生成 流程图

文章目录 Prompt输出Copy 到 drawio Prompt 我想做一个研发标准化的流程,但是我是一个小白,不懂研发管理的流 程,我希望你作为一个经验丰富的技术管理人员,请帮我梳理一个完整流程,包括需求分析、概要设计,代码走查等等,输出的节点不少于18个,包含逻辑判断的分支,要通循实事求…

【ChatGPT】基于GO语言实现的微信聊天和图片生成机器人

ChatGPT-DreamStudio WeChat Robot &#x1f3a8;基于GO语言实现的微信聊天和图片生成机器人&#x1f3a8; 个人微信接入ChatGPT&#xff0c;实现和GPT机器人互动聊天&#xff0c;同时支持基于文本生成图像。支持私聊回复和群聊艾特回复。 GitHub源代码地址 实现功能 GPT机…

ChatGPT 使用 拓展资料:吴恩达大咖 Building Systems with the ChatGPT API 系统评估1

ChatGPT 使用 拓展资料:吴恩达大咖 Building Systems with the ChatGPT API 系统评估1 在前面的几段视频中,用户展示了如何使用一个小时来构建应用程序,从评估这些输入到处理输入,再到进行最终打开检查,再到向用户显示输出。显然是构建了这样一个系统。 你怎么知道它是如…

国内这么多“ChatGPT”是真是假

国内这些产品是真的吗&#xff1f;与国外的ChatGPT有什么联系&#xff1f; 用ChatGPT官方图标当头像 免费试用几次后开始收费 据澎湃科技报道&#xff0c;随手点开微信搜索框&#xff0c;就可以发现一系列与ChatGPT“沾亲带故”的产品&#xff0c;并以ChatGPT的官方图标为头像。…

国内最火chatgpt一款强大的国内智能AI语言模型(据说对接了chatgpt4)

简介 Cursor是一个集成了 GPT-4 的国内直接可以访问的&#xff0c;优秀而强大的免费代码生成器&#xff0c;可以帮助你快速编写、编辑和讨论代码。 它支持多种编程语言&#xff0c;如 Python, Java, C#, JavaScript 等&#xff0c;并且可以根据你的输入和需求自动生成代码片段…

给头像戴上口罩

刚搭建的给头像戴口罩的小网页给你的头像戴上口罩-宅主页&#xff0c;包含N95在内的10个口罩可选&#xff0c;欢迎试用

springboot+chatgpt+chatUI Pro开发智能聊天工具

应广大网友要求&#xff0c;也为了节约大家的时间现为大家奉献上源码下载地址&#xff1a;https://download.csdn.net/download/xiangyuanhong08/87708197源码下载后在IDEA导入项目后自己修改配置文件中apiKey运行即可。 一、技术介绍 1.chatgpt-java是一个OpenAI的Java版SDK&a…

如何制作 ChatGPT 清晰有效咒语与Chat GPT高效交流——基础篇 第二课

【ChatGPT】前些天发现了一个巨牛的人工智能学习电子书&#xff0c;通俗易懂&#xff0c;风趣幽默&#xff0c;无广告&#xff0c;忍不住分享一下给大家。&#xff08;点击查看学习资料&#xff09; 在上一篇文章中&#xff0c;我们已经了解了 ChatGPT 的特性、应用范围以及逆…

那些追赶ChatGPT,Disco Diffusion的软件合集

用了那么多AI工具后&#xff0c;我才真正地意识到那句话“科技改变生活”&#xff0c;知识的力量果真是无穷的。 那我就来盘点一下我用过自认可以比肩题主提到的两个工具的小宝藏吧~ 一、AI助手 - OpenAI ChatGPT侧边栏AI助手 - OpenAI ChatGPT侧边栏(国内免费使用) - Microso…

New Bing乘上ChatGPT的东风,日活突破1亿

我是卢松松&#xff0c;点点上面的头像&#xff0c;欢迎关注我哦&#xff01; 注&#xff1a;本文由松松杰哥缮写&#xff0c;ChatGPT进行了补充和润色&#xff0c;你们可以看看本文和其他文有什么区别&#xff1f; 微软今天宣布&#xff0c;New Bing乘上ChatGPT的东风&#xf…

如何使用ChatGPT制作免费的数字人

传统的数字人制作过程 制作属于自己的免费的数字人是一个复杂的过程&#xff0c;需要涉及多个方面的知识和技术。以下是一个大致的步骤指南&#xff0c;以帮助你开始这个过程&#xff1a; 1. 确定数字人的目标和设计&#xff1a;首先&#xff0c;你需要确定数字人的用途和目标…

ChatGPT实战应用:与外国真人聊天并提升英语能力

目录 1.简介 2.HelloTalk 3.chatgpt的使用 4.结语 1.简介 如果你想通过和外国人聊天来提升英语技能或者了解他们国家的文化&#xff0c;但是却没有门路&#xff1b; 如果你有门路&#xff0c;奈何不知道该如何表达自己的想法或者不知道自己表达的意思到底是否准确&#xf…

人人都能懂的ChatGPT解读

作者 | 张杰&#xff0c;中关村科金技术副总裁 策划 | 刘燕 ChatGPT 发布了两个多月&#xff0c;热度不降反增&#xff0c;不断火爆出圈。是时候&#xff0c;为不懂 AI 技术的同学们白话科普一下了。本文将用浅显且不严谨的语言解惑以下问题&#xff1a;ChatGPT 为什么能火起来…

【哈士奇赠书活动 - 23期】-〖你好 ChatGPT〗

文章目录 ⭐️ 赠书 - 《你好 ChatGPT》⭐️ 内容简介⭐️ 作者简介⭐️ 精彩书评⭐️ 赠书活动 → 获奖名单 ⭐️ 赠书 - 《你好 ChatGPT》 ⭐️ 内容简介 人工智能&#xff08;AI&#xff09;时代已经来临&#xff0c;AIGC&#xff08;人工智能生成内容&#xff09;正在进一步…

半小时用ChatGPT构建你的虚拟形象

大家好&#xff0c;欢迎来到我的频道&#xff0c;今天我来教大家如何用ChatGPT创建一个虚拟形象&#xff0c;如下图和视频所示。 视频地址&#xff1a;https://www.bilibili.com/video/BV1nD4y1u7Ti/?vd_source8b5cb1818bd1c0e0ac8b604d1a720e2d 生成图片 首先&#xff0c;我…

面向ChatGPT编程

让AI编写一个聊天页面&#xff0c;先看一下成品图。 用react写一个聊天页面 让AI把聊天室页面加上好看的样式 要求让样式鲜艳一点 Windows浏览器的滚动条太丑了&#xff0c;让他美化一下 要求AI把聊天内容加上头像&#xff0c;就像微信一样 大功告成&#xff0c;顺利下岗~

idea好用插件分享——Bito-ChatGPT

文章目录 安装步骤&#xff1a;第一步&#xff1a;打开Setting第二步&#xff1a;选择Plugins&#xff0c;输入Bito&#xff0c;就可以搜索出来了&#xff0c;再点击安装第三步&#xff1a;安装完成回到IDEA主界面&#xff0c;右边区域可以看到Bito图标&#xff0c;点击展开&am…