Language Translation with TorchText

前言:

  • 利用torchtext类来处理一个著名的数据集,包含了一些英文和德文句子。利用该数据处理sequence-to-sequence模型,通过注意力机制,可以将德语翻译成英语。
  • Torchtext:它是 PyTorch 生态系统中的一个库,主要用于自然语言处理(NLP)相关的任务。它提供了一系列便捷的工具和功能来处理文本数据,使得在进行 NLP 项目开发时更加高效。
  • 创建可轻松迭代的数据集:在训练语言模型(这里特指语言翻译模型)时,需要大量的文本数据作为训练素材。这些数据要以一种合适的格式组织起来,以便模型能够逐批次地读取和处理。Torchtext 提供的工具可以帮助我们把原始的文本数据(比如来自不同语言的句子对,用于翻译任务)整理成这样一种可以方便地按批次依次访问的数据集形式,就像在一个循环中能够轻松地逐个获取数据样本进行后续处理一样。
  • Spacy 被选用是因为它在对除英语之外的其他众多语言进行分词处理时,能够给予非常强有力的支持。虽然 torchtext 这个工具也提供了像 basic_english 标记器这样专门针对英语的标记器,而且还支持其他一些英语标记器(比如 Moses 标记器等),但是当涉及到语言翻译任务的时候,往往需要处理多种不同的语言,而在这种多种语言并存的情况下,Spacy 就成了最佳的选择,它能更好地应对不同语言的分词需求,以便为后续的语言翻译相关工作打好基础

       本教程介绍了如何使用torchtext预处理包含英语和德语句子的著名数据集的数据,并使用它来训练序列到序列模型,并能将德语句子翻译成英语

Field and TranslationDataset

        torchtext可用来实现语言翻译模型。一个主要的类是 Field, 其可以指定每个句子如何被处理。另一个是TranslationDataset 。torchtext有许多这样的数据集。在本文,我们利用 Multi30k dataset,其中包含了30000句子(平均长约13个单词)同时有英语和德语。

注意: 本教程中的标记化(tokenization)依赖Spacy使用Spacy是因为其提供了除了英语外的强有力的标记化支持。torchtext提供了basic_english标记化及其他英语标记化支持(如,Moses),但是对于语言翻译——需要多种语言,Spacy是最佳选择。

为了运行代码需要安装利用pip或者conda安装sapcy

pip install spacy

然后下载英语和德语 Spacy 分词器的原始数据:

python -m spacy download en
python -m spacy download de

安装了Spacy之后,随后的代码将基于Field中定义的标记器(tokenizer)将TranslationDataset中的每个句子进行标记化。

from torchtext.datasets import Multi30k
from torchtext.data import Field, BucketIteratorSRC = Field(tokenize = "spacy",tokenizer_language="de",init_token = '<sos>',eos_token = '<eos>',lower = True)TRG = Field(tokenize = "spacy",tokenizer_language="en",init_token = '<sos>',eos_token = '<eos>',lower = True)train_data, valid_data, test_data = Multi30k.splits(exts = ('.de', '.en'),fields = (SRC, TRG))

        定义train_data后,可以看到torchtext的Field的作用:build_vocab方法允许我们创建和每种语言相关的字典

SRC.build_vocab(train_data, min_freq = 2)
TRG.build_vocab(train_data, min_freq = 2)

运行这些后,SRC.vocab.stoi将会是一个字典:tokens为键,对应的indices为值。SRC.vocab.itos是同样的字典。

BucketIterator

最后利用BucketIterator,利用TranslationDataset 作为第一个参数。定义一个迭代器,用于将相似长度的示例批处理在一起。在为每个新epoch生产新的洗批量时,最小化所需的填充量。

import torchdevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')BATCH_SIZE = 128train_iterator, valid_iterator, test_iterator = BucketIterator.splits((train_data, valid_data, test_data),batch_size = BATCH_SIZE,device = device)

这些迭代器类似于DataLoader:

for i, batch in enumerate(iterator):

每个批量有两无属性:

src = batch.src
trg = batch.trg

Defining our nn.Module and Optimizer

import random
from typing import Tupleimport torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch import Tensorclass Encoder(nn.Module):def __init__(self,input_dim: int,emb_dim: int,enc_hid_dim: int,dec_hid_dim: int,dropout: float):super().__init__()self.input_dim = input_dimself.emb_dim = emb_dimself.enc_hid_dim = enc_hid_dimself.dec_hid_dim = dec_hid_dimself.dropout = dropoutself.embedding = nn.Embedding(input_dim, emb_dim)self.rnn = nn.GRU(emb_dim, enc_hid_dim, bidirectional = True)self.fc = nn.Linear(enc_hid_dim * 2, dec_hid_dim)self.dropout = nn.Dropout(dropout)def forward(self,src: Tensor) -> Tuple[Tensor]:embedded = self.dropout(self.embedding(src))outputs, hidden = self.rnn(embedded)hidden = torch.tanh(self.fc(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)))return outputs, hiddenclass Attention(nn.Module):def __init__(self,enc_hid_dim: int,dec_hid_dim: int,attn_dim: int):super().__init__()self.enc_hid_dim = enc_hid_dimself.dec_hid_dim = dec_hid_dimself.attn_in = (enc_hid_dim * 2) + dec_hid_dimself.attn = nn.Linear(self.attn_in, attn_dim)def forward(self,decoder_hidden: Tensor,encoder_outputs: Tensor) -> Tensor:src_len = encoder_outputs.shape[0]repeated_decoder_hidden = decoder_hidden.unsqueeze(1).repeat(1, src_len, 1)encoder_outputs = encoder_outputs.permute(1, 0, 2)energy = torch.tanh(self.attn(torch.cat((repeated_decoder_hidden,encoder_outputs),dim = 2)))attention = torch.sum(energy, dim=2)return F.softmax(attention, dim=1)class Decoder(nn.Module):def __init__(self,output_dim: int,emb_dim: int,enc_hid_dim: int,dec_hid_dim: int,dropout: int,attention: nn.Module):super().__init__()self.emb_dim = emb_dimself.enc_hid_dim = enc_hid_dimself.dec_hid_dim = dec_hid_dimself.output_dim = output_dimself.dropout = dropoutself.attention = attentionself.embedding = nn.Embedding(output_dim, emb_dim)self.rnn = nn.GRU((enc_hid_dim * 2) + emb_dim, dec_hid_dim)self.out = nn.Linear(self.attention.attn_in + emb_dim, output_dim)self.dropout = nn.Dropout(dropout)def _weighted_encoder_rep(self,decoder_hidden: Tensor,encoder_outputs: Tensor) -> Tensor:a = self.attention(decoder_hidden, encoder_outputs)a = a.unsqueeze(1)encoder_outputs = encoder_outputs.permute(1, 0, 2)weighted_encoder_rep = torch.bmm(a, encoder_outputs)weighted_encoder_rep = weighted_encoder_rep.permute(1, 0, 2)return weighted_encoder_repdef forward(self,input: Tensor,decoder_hidden: Tensor,encoder_outputs: Tensor) -> Tuple[Tensor]:input = input.unsqueeze(0)embedded = self.dropout(self.embedding(input))weighted_encoder_rep = self._weighted_encoder_rep(decoder_hidden,encoder_outputs)rnn_input = torch.cat((embedded, weighted_encoder_rep), dim = 2)output, decoder_hidden = self.rnn(rnn_input, decoder_hidden.unsqueeze(0))embedded = embedded.squeeze(0)output = output.squeeze(0)weighted_encoder_rep = weighted_encoder_rep.squeeze(0)output = self.out(torch.cat((output,weighted_encoder_rep,embedded), dim = 1))return output, decoder_hidden.squeeze(0)class Seq2Seq(nn.Module):def __init__(self,encoder: nn.Module,decoder: nn.Module,device: torch.device):super().__init__()self.encoder = encoderself.decoder = decoderself.device = devicedef forward(self,src: Tensor,trg: Tensor,teacher_forcing_ratio: float = 0.5) -> Tensor:batch_size = src.shape[1]max_len = trg.shape[0]trg_vocab_size = self.decoder.output_dimoutputs = torch.zeros(max_len, batch_size, trg_vocab_size).to(self.device)encoder_outputs, hidden = self.encoder(src)# first input to the decoder is the <sos> tokenoutput = trg[0,:]for t in range(1, max_len):output, hidden = self.decoder(output, hidden, encoder_outputs)outputs[t] = outputteacher_force = random.random() < teacher_forcing_ratiotop1 = output.max(1)[1]output = (trg[t] if teacher_force else top1)return outputsINPUT_DIM = len(SRC.vocab)
OUTPUT_DIM = len(TRG.vocab)
# ENC_EMB_DIM = 256
# DEC_EMB_DIM = 256
# ENC_HID_DIM = 512
# DEC_HID_DIM = 512
# ATTN_DIM = 64
# ENC_DROPOUT = 0.5
# DEC_DROPOUT = 0.5ENC_EMB_DIM = 32
DEC_EMB_DIM = 32
ENC_HID_DIM = 64
DEC_HID_DIM = 64
ATTN_DIM = 8
ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5enc = Encoder(INPUT_DIM, ENC_EMB_DIM, ENC_HID_DIM, DEC_HID_DIM, ENC_DROPOUT)attn = Attention(ENC_HID_DIM, DEC_HID_DIM, ATTN_DIM)dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, ENC_HID_DIM, DEC_HID_DIM, DEC_DROPOUT, attn)model = Seq2Seq(enc, dec, device).to(device)def init_weights(m: nn.Module):for name, param in m.named_parameters():if 'weight' in name:nn.init.normal_(param.data, mean=0, std=0.01)else:nn.init.constant_(param.data, 0)model.apply(init_weights)optimizer = optim.Adam(model.parameters())def count_parameters(model: nn.Module):return sum(p.numel() for p in model.parameters() if p.requires_grad)print(f'The model has {count_parameters(model):,} trainable parameters')

注意:在对语言翻译模型的性能进行评分时,必须告诉nn.CrossEntropyLoss函数忽略目标只是填充的索引。

PAD_IDX = TRG.vocab.stoi['<pad>']criterion = nn.CrossEntropyLoss(ignore_index=PAD_IDX)

最后训练和评估:

import math
import timedef train(model: nn.Module,iterator: BucketIterator,optimizer: optim.Optimizer,criterion: nn.Module,clip: float):model.train()epoch_loss = 0for _, batch in enumerate(iterator):src = batch.srctrg = batch.trgoptimizer.zero_grad()output = model(src, trg)output = output[1:].view(-1, output.shape[-1])trg = trg[1:].view(-1)loss = criterion(output, trg)loss.backward()torch.nn.utils.clip_grad_norm_(model.parameters(), clip)optimizer.step()epoch_loss += loss.item()return epoch_loss / len(iterator)def evaluate(model: nn.Module,iterator: BucketIterator,criterion: nn.Module):model.eval()epoch_loss = 0with torch.no_grad():for _, batch in enumerate(iterator):src = batch.srctrg = batch.trgoutput = model(src, trg, 0) #turn off teacher forcingoutput = output[1:].view(-1, output.shape[-1])trg = trg[1:].view(-1)loss = criterion(output, trg)epoch_loss += loss.item()return epoch_loss / len(iterator)def epoch_time(start_time: int,end_time: int):elapsed_time = end_time - start_timeelapsed_mins = int(elapsed_time / 60)elapsed_secs = int(elapsed_time - (elapsed_mins * 60))return elapsed_mins, elapsed_secsN_EPOCHS = 10
CLIP = 1best_valid_loss = float('inf')for epoch in range(N_EPOCHS):start_time = time.time()train_loss = train(model, train_iterator, optimizer, criterion, CLIP)valid_loss = evaluate(model, valid_iterator, criterion)end_time = time.time()epoch_mins, epoch_secs = epoch_time(start_time, end_time)print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')print(f'\t Val. Loss: {valid_loss:.3f} |  Val. PPL: {math.exp(valid_loss):7.3f}')test_loss = evaluate(model, test_iterator, criterion)print(f'| Test Loss: {test_loss:.3f} | Test PPL: {math.exp(test_loss):7.3f} |')

全部代码:

import torch# 打印PyTorch版本
print(torch.__version__)
# 根据是否有可用的CUDA设备来选择设备(GPU或CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(str(device) + ':' + str(torch.cuda.is_available()))from torchtext.utils import download_from_url, extract_archive# 数据集的基础URL
url_base = 'https://raw.githubusercontent.com/multi30k/dataset/master/data/task1/raw/'
# 训练集、验证集、测试集的URL文件名元组
train_urls = ('train.de.gz', 'train.en.gz')
val_urls = ('val.de.gz', 'val.en.gz')
test_urls = ('test_2016_flickr.de.gz', 'test_2016_flickr.en.gz')# 以下三行原先是通过下载和解压获取文件路径,这里注释掉了,替换为自定义路径示例
# train_filepaths = [extract_archive(download_from_url(url_base + url))[0] for url in train_urls]
train_filepaths = ['... your path\\.data\\train.de', '... your path\\.data\\train.en']
val_filepaths = [extract_archive(download_from_url(url_base + url))[0] for url in val_urls]
test_filepaths = [extract_archive(download_from_url(url_base + url))[0] for url in test_urls]from torchtext.data.utils import get_tokenizer# 获取德语和英语的分词器,使用Spacy,指定对应的语言模型
de_tokenizer = get_tokenizer('spacy', language='de_core_news_sm')
en_tokenizer = get_tokenizer('spacy', language='en_core_web_sm')from collections import Counter
from torchtext.vocab import Vocab
import io# 构建词汇表的函数,根据给定文件路径和分词器统计词频并创建词汇表
def build_vocab(filepath, tokenizer):counter = Counter()with io.open(filepath, encoding="utf8") as f:for string_ in f:counter.update(tokenizer(string_))return Vocab(counter, specials=['<unk>', '<pad>', '<bos>', '<eos>'])# 构建德语和英语的词汇表
de_vocab = build_vocab(train_filepaths[0], de_tokenizer)
en_vocab = build_vocab(train_filepaths[1], en_tokenizer)# 数据处理函数,将文本数据转换为张量形式,配对德语和英语句子
def data_process(filepaths):raw_de_iter = iter(io.open(filepaths[0], encoding="utf8"))raw_en_iter = iter(io.open(filepaths[1], encoding="utf8"))data = []for (raw_de, raw_en) in zip(raw_de_iter, raw_en_iter):de_tensor_ = torch.tensor([de_vocab[token] for token in de_tokenizer(raw_de)], dtype=torch.long)en_tensor_ = torch.tensor([en_vocab[token] for token in en_tokenizer(raw_en)], dtype=torch.long)data.append((de_tensor_, en_tensor_))return data# 处理训练集、验证集、测试集数据
train_data = data_process(train_filepaths)
val_data = data_process(val_filepaths)
test_data = data_process(test_filepaths)BATCH_SIZE = 128
PAD_IDX = de_vocab['<pad>']
BOS_IDX = de_vocab['<bos>']
EOS_IDX = de_vocab['<eos>']from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader# 生成批次数据的函数,添加起始、结束标记并填充序列
def generate_batch(data_batch):de_batch, en_batch = [], []for (de_item, en_item) in data_batch:de_batch.append(torch.cat([torch.tensor([BOS_IDX]), de_item, torch.tensor([EOS_IDX])], dim=0))en_batch.append(torch.cat([torch.tensor([BOS_IDX]), en_item, torch.tensor([EOS_IDX])], dim=0))de_batch = pad_sequence(de_batch, padding_value=PAD_IDX)en_batch = pad_sequence(en_batch, padding_value=PAD_IDX)return de_batch, en_batch# 创建训练集、验证集、测试集的数据加载器
train_iter = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, collate_fn=generate_batch)
valid_iter = DataLoader(val_data, batch_size=BATCH_SIZE, shuffle=True, collate_fn=generate_batch)
test_iter = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True, collate_fn=generate_batch)import torch.nn as nn
from typing import Tuple# 编码器类,对输入进行编码
class Encoder(nn.Module):def __init__(self, input_dim: int, emb_dim: int, enc_hid_dim: int, dec_hid_dim: int, dropout: float):super().__init__()self.input_dim = input_dimself.emb_dim = emb_dimself.enc_hid_dim = enc_hid_dimself.dec_hid_dim = dec_hid_dimself.dropout = dropoutself.embedding = nn.Embedding(input_dim, emb_dim)self.dropout = nn.Dropout(dropout)self.rnn = nn.GRU(emb_dim, enc_hid_dim, bidirectional=True)self.fc = nn.Linear(enc_hid_dim * 2, dec_hid_dim)def forward(self, src: torch.Tensor) -> Tuple[torch.Tensor]:embedded = self.dropout(self.embedding(src))outputs, hidden = self.rnn(embedded)hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)hidden = self.fc(hidden)hidden = torch.tanh(hidden)return outputs, hiddenimport torch.nn.functional as F# 注意力机制类,计算注意力权重
class Attention(nn.Module):def __init__(self, enc_hid_dim: int, dec_hid_dim: int, attn_dim: int):super().__init__()self.enc_hid_dim = enc_hid_dimself.dec_hid_dim = dec_hid_dimself.attn_in = (enc_hid_dim * 2) + dec_hid_dimself.attn = nn.Linear(self.attn_in, attn_dim)def forward(self, decoder_hidden, encoder_outputs) -> torch.Tensor:src_len = encoder_outputs.shape[0]repeated_decoder_hidden = decoder_hidden.unsqueeze(1).repeat(1, src_len, 1)encoder_outputs = encoder_outputs.permute(1, 0, 2)energy = torch.tanh(self.attn(torch.cat((repeated_decoder_hidden, encoder_outputs), dim=2)))attention = torch.sum(energy, dirm=2)return F.softmax(attention, dim=1)# 解码器类,根据编码信息和注意力机制生成输出
class Decoder(nn.Module):def __init__(self, output_dim: int, emb_dim: int, enc_hid_dim: int, dec_hid_dim: int, dropout: int, attention: nn.Module):super().__init__()self.emb_dim = emb_dimself.enc_hid_dim = enc_hid_dimself.dec_hid_dim = dec_hid_dimself.output_dim = output_dimself.dropout = dropoutself.attention = attentionself.embedding = nn.Embedding(output_dim, emb_dim)self.dropout = nn.Dropout(dropout)self.rnn = nn.GRU((enc_hid_dim * 2) + emb_dim, dec_hid_dim)self.out = nn.Linear(self.attention.attn_in + emb_dim, output_dim)def _weighted_encoder_rep(self, decoder_hidden, encoder_outputs) -> torch.Tensor:a = self.attention(decoder_hidden, encoder_outputs)a = a.unsqueeze(1)encoder_outputs = encoder_outputs.permute(1, 0, 2)weighted_encoder_rep = torch.bmm(a, encoder_outputs)weighted_encoder_rep = weighted_encoder_rep.permute(1, 0, 2)return weighted_encoder_repdef forward(self, input: torch.Tensor, decoder_hidden: torch.Tensor, encoder_outputs: torch.Tensor) -> Tuple[torch.Tensor]:input = input.unsqueeze(0)embedded = self.dropout(self.embedding(input))weighted_encoder_rep = self._weighted_encoder_rep(decoder_hidden, encoder_outputs)rnn_input = torch.cat((embedded, weighted_encoder_rep), dim=2)output, decoder_hidden = self.rnn(rnn_input, decoder_hidden.unsqueeze(0))embedded = embedded.squeeze(0)output = output.squeeze(0)weighted_encoder_rep = weighted_encoder_rep.squeeze(0)output = self.out(torch.cat((output, weighted_encoder_rep, embedded), dim=1))return output, decoder_hidden.squeeze(0)import random# 序列到序列模型类,结合编码器和解码器进行翻译
class Seq2Seq(nn.Module):def __init__(self, encoder: nn.Module, decoder: nn.Module, device: torch.device):super().__init__()self.encoder = encoderself.decoder = decoderself.device = devicedef forward(self, src: torch.Tensor, trg: torch.Tensor, teacher_forcing_ratio: float = 0.5) -> torch.Tensor:batch_size = src.shape[1]max_len = trg.shape[0]trg_vocab_size = self.decoder.output_dimoutputs = torch.zeros(max_len, batch_size, trg_vocab_size).to(self.device)encoder_outputs, hidden = self.encoder(src)output = trg[0,:]for t in range(1, max_len):output, hidden = self.decoder(output, hidden, encoder_outputs)outputs[t] = outputteacher_force = random.random() < teacher_forcing_ratiotop1 = output.max(1)[1]output = (trg[t] if teacher_force else top1)return outputs# 定义输入、输出维度及各种模型参数
INPUT_DIM = len(de_vocab)
OUTPUT_DIM = len(en_vocab)
ENC_EMB_DIM = 32
DEC_EMB_DIM = 32
ENC_HID_DIM = 64
DEC_HID_DIM = 64
ATTN_DIM = 8
ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5# 创建编码器、注意力机制、解码器和序列到序列模型实例,并将模型移到指定设备
enc = Encoder(INPUT_DIM, ENC_EMB_DIM, ENC_HID_DIM, DEC_HID_DIM, ENC_DROPOUT)
attn = Attention(ENC_HID_DIM, DEC_HID_DIM, ATTN_DIM)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, ENC_HID_DIM, DEC_HID_DIM, DEC_DROPOUT, attn)
model = Seq2Seq(enc, dec, device).to(device)# 初始化模型权重的函数
def init_weights(m: nn.Module):for name, param in m.named_parameters():if 'weight' in name:nn.init.normal_(param.data, mean=0, std=0.01)else:nn.init.constant_(param.data, 0)model.apply(init_weights)import torch.optim as optim# 创建优化器和损失函数
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss(ignore_index=en_vocab.stoi['<pad>'])import math
import time# 训练函数,执行一个训练 epoch,更新模型参数
def train(model: nn.Module, iterator: torch.utils.data.DataLoader, optimizer: optim.Optimizer, criterion: nn.Module, clip: float):model.train()epoch_loss = 0for _, (src, trg) in enumerate(iterator):src, trg = src.to(device), trg.to(device)optimizer.zero_grad()output = model(src, trg)output = output[1:].view(-1, output.shape[-1])trg = trg[1:].view(-1)loss = criterion(output, trg)loss.backward()torch.nn.utils.clip_grad_norm_(model.parameters(), clip)optimizer.step()epoch_loss += loss.item()return epoch_loss / len(iterator)# 评估函数,在给定数据集上评估模型性能
def evaluate(model: nn.Module, iterator: torch.utils.data.DataLoader, criterion: nn.Module):model.eval()epoch_loss = 0with torch.no_grad():for _, (src, trg) in enumerate(iterator):src, trg = src.to(device), trg.to(device)output = model(src, trg, 0)output = output[1:].view(-1, output.shape[-1])trg = trg[1:].view(-1)loss = criterion(output, trg)epoch_loss += loss.item()return epoch_loss / len(iterator)# 计算一个 epoch 训练或评估所花费时间的函数
def epoch_time(start_time, end_time):elapsed_time = end_time - start_timeelapsed_mins = int(elapsed_time / 60)elapsed_secs = int(elapsed_time - (elapsed_mins * 60))return elapsed_mins, elapsed_secsif __name__ == '__main__':# 以下部分原先是训练模型并保存,这里注释掉了# N_EPOCHS = 10# CLIP = 1# best_valid_loss = float('inf')# for epoch in range(N_EPOCHS):#     start_time = time.time()#     train_loss = train(model, train_iter, optimizer, criterion, CLIP)#     valid_loss = evaluate(model, valid_iter, criterion)#     end_time = time.time()#     epoch_mins, epoch_secs = epoch_time(start_time, end_time)##     print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')#     print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')#     print(f'\t Val. Loss: {valid_loss:.3f} |  Val. PPL: {math.exp(valid_loss):7.3f}')## test_loss = evaluate(model, test_iter, criterion)## print(f'| Test Loss: {test_loss:.3f} | Test PPL: {math.exp(test_loss):7.3f} |')## torch.save(model.state_dict(), '... your path\\model_Translate.pth')# 以下为测试部分,加载预训练模型并进行翻译预测model.load_state_dict(torch.load('... your path\\model_Translate.pth'))str = 'Zwei junge weiße Männer sind im Freien in der Nähe vieler Büsche.'de_tensor = torch.tensor([de_vocab[token] for token in de_tokenizer(str)], dtype=torch.long)de_tensor = torch.cat([torch.tensor([BOS_IDX]), de_tensor, torch.tensor([EOS_IDX])], dim=0)de_tensor = de_tensor.unsqueeze(1).to(device)output = model(de_tensor, torch.zeros(50,1).long().to(device),0)output = output[1:].view(-1, output.shape[-1])result = []for i in range(output.size()[0]):index = output[i].data.topk(1)[1].item()if en_vocab.itos[index] == '<eos>':breakresult.append(en_vocab.itos[index])print(' '.join(result))

        这段代码主要实现了一个基于序列到序列(Seq2Seq)架构的机器翻译模型,包括数据预处理、模型构建(编码器、解码器、注意力机制等)、训练和评估函数以及最后的测试部分,用于将德语句子翻译成英语句子。

Next steps

  • Check out the rest of Ben Trevett’s tutorials using torchtext here
  • Stay tuned for a tutorial using other torchtext features along with nn.Transformer for language modeling via next word prediction!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/483929.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【Redis篇】 List 列表

在 Redis 中&#xff0c;List 是一种非常常见的数据类型&#xff0c;用于表示一个有序的字符串集合。与传统的链表结构类似&#xff0c;Redis 的 List 支持在两端进行高效的插入和删除操作&#xff0c;因此非常适合实现队列&#xff08;Queue&#xff09;和栈&#xff08;Stack…

11.爬虫

前言&#xff1a; 正则表达式的作用&#xff1a; 作用一&#xff1a;校验字符串是否满足规则 作用二&#xff1a;在一段文本中查找满足要求的内容 一.Pattern类和Matcher类&#xff1a; 1.Pattern类&#xff1a;表示正则表达式 a.因此获取Pattern对象就相当于获取正则表达式…

【Linux篇】权限管理 - 用户与组权限详解

一. 什么是权限&#xff1f; 首先权限是限制人的。人 真实的人 身份角色 权限 角色 事物属性 二. 认识人–用户 Linux下的用户分为超级用户和普通用户 root :超级管理员&#xff0c;几乎不受权限的约束普通用户 :受权限的约束超级用户的命令提示符是#&#xff0c;普通用…

【RDMA】RDMA read和write编程实例(verbs API)

WRITE|READ编程&#xff08;RDMA read and write with IB verbs&#xff09; &#xff08;本文讲解的示例代码在&#xff1a;RDMA read and write with IB verbs | The Geek in the Corner&#xff09; 将 RDMA 与verbs一起使用非常简单&#xff1a;首先注册内存块&#xff0c…

UE5 C++ 不规则按钮识别,复选框不规则识别 UPIrregularWidgets

插件名称&#xff1a;UPIrregularWidgets 插件包含以下功能 你可以点击任何图片&#xff0c;而不仅限于矩形图片。 UPButton、UPCheckbox 基于原始的 Button、Checkbox 扩展。 复选框增加了不规则图像识别功能&#xff0c;复选框增加了悬停事件。 欢迎来到我的博客 记录学习过…

洛谷P2670扫雷游戏(Java)

三.P2670 [NOIP2015 普及组] 扫雷游戏 题目背景 NOIP2015 普及组 T2 题目描述 扫雷游戏是一款十分经典的单机小游戏。在 n 行 m列的雷区中有一些格子含有地雷&#xff08;称之为地雷格&#xff09;&#xff0c;其他格子不含地雷&#xff08;称之为非地雷格&#xff09;。玩…

如何加强游戏安全,防止定制外挂影响游戏公平性

在现如今的游戏环境中&#xff0c;外挂始终是一个困扰玩家和开发者的问题。尤其是定制挂&#xff08;Customized Cheats&#xff09;&#xff0c;它不仅复杂且隐蔽&#xff0c;更能针对性地绕过传统的反作弊系统&#xff0c;对游戏安全带来极大威胁。定制挂通常是根据玩家的需求…

概率论相关知识随记

作为基础知识的补充&#xff0c;随学随记&#xff0c;方便以后查阅。 概率论相关知识随记 期望&#xff08;Expectation&#xff09;期望的定义离散型随机变量的期望示例&#xff1a;掷骰子的期望 连续型随机变量的期望示例&#xff1a;均匀分布的期望 期望的性质线性性质期望的…

DICOM MPPS详细介绍

文章目录 前言一、常规检查业务流程二、MPPS的作用三、MPPS的原理1、MPPS与MWL2、MPPS服务过程 四、MPPS的实现步骤1、创建实例2、传递状态 五、总结 前言 医院中现有的DICOM MWL(Modality Worklist)已开始逐渐得到应用&#xff0c;借助它可以实现病人信息的自动录入&#xff0…

Secured Finance 推出 TVL 激励计划以及基于 FIL 的稳定币

Secured Finance 是新一代 DeFi 2.0 协议&#xff0c;其正在推出基于 FIL 的稳定币、固定收益市场以及具有吸引力的 TVL 激励计划&#xff0c;以助力 Filecoin 构建更强大的去中心化金融生态体系&#xff0c;并为 2025 年初 Secured Finance 协议代币的推出铺平道路。Secure Fi…

FPGA Xilinx维特比译码器实现卷积码译码

FPGA Xilinx维特比译码器实现卷积码译码 文章目录 FPGA Xilinx维特比译码器实现卷积码译码1 Xilinx维特比译码器实现2 完整代码3 仿真结果 MATLAB &#xff08;n,k,m&#xff09;卷积码原理及仿真代码&#xff08;你值得拥有&#xff09;_matlab仿真后代码-CSDN博客 MATLAB 仿真…

Linux 权限管理:用户分类、权限解读与常见问题剖析

&#x1f31f; 快来参与讨论&#x1f4ac;&#xff0c;点赞&#x1f44d;、收藏⭐、分享&#x1f4e4;&#xff0c;共创活力社区。&#x1f31f; &#x1f6a9;用通俗易懂且不失专业性的文字&#xff0c;讲解计算机领域那些看似枯燥的知识点&#x1f6a9; 目录 &#x1f4af;L…

rabbitmq 安装延时队列插件rabbitmq_delayer_message_exchange(linux centOS 7)

1.插件版本 插件地址&#xff1a;Community Plugins | RabbitMQ rabbitmq插件需要对应的版本&#xff0c;根据插件地址找到插件 rabbitmq_delayer_message_exchange 点击Releases 因为我rabbitmq客户端显示的版本是&#xff1a; 所以我选择插件版本是&#xff1a; 下载 .ez文…

遗传算法与深度学习实战(26)——编码卷积神经网络架构

遗传算法与深度学习实战&#xff08;26&#xff09;——编码卷积神经网络架构 0. 前言1. EvoCNN 原理1.1 工作原理1.2 基因编码 2. 编码卷积神经网络架构小结系列链接 0. 前言 我们已经学习了如何构建卷积神经网络 (Convolutional Neural Network, CNN)&#xff0c;在本节中&a…

数学建模之熵权法

熵权法 概述 **熵权法(Entropy Weight Method,EWM)**是一种客观赋权的方法&#xff0c;原理&#xff1a;指标的变异程度越小&#xff0c;所包含的信息量也越小&#xff0c;其对应的权值应该越低&#xff08;例如&#xff0c;如果对于所有样本而言&#xff0c;某项指标的值都相…

同道猎聘Q3营收降利润增,AI或成估值重塑关键词

2024年&#xff0c;经济向好的趋势没有改变&#xff0c;挑战却仍然存在。企业纷纷进行结构性变革优化或业务方向调整。这一点反映到人才市场&#xff0c;绝大多数企业对招聘扩张持保守态度&#xff0c;降本增效的主题仍在延续。 作为人才市场水温变化的“温度计”&#xff0c;…

46 基于单片机的烧水壶系统设计

目录 一、主要功能 二、硬件资源 三、程序编程 四、实现现象 一、主要功能 基于STC89C52RC单片机&#xff0c;采用四个按键&#xff0c;通过DS18B20检测温度&#xff0c;开机显示实时温度 第一个按键为切换功能按键&#xff0c;按下后&#xff0c;可以设置烧水温度的大小&…

推荐学习笔记:矩阵补充和矩阵分解

参考&#xff1a; 召回 fun-rec/docs/ch02/ch2.1/ch2.1.1/mf.md at master datawhalechina/fun-rec GitHub 业务 隐语义模型与矩阵分解 协同过滤算法的特点&#xff1a; 协同过滤算法的特点就是完全没有利用到物品本身或者是用户自身的属性&#xff0c; 仅仅利用了用户与…

【机器学习】—Transformers的扩展应用:从NLP到多领域突破

好久不见&#xff01;喜欢就关注吧~ 云边有个稻草人-CSDN博客 目录 引言 一、Transformer架构解析 &#xff08;一&#xff09;、核心组件 &#xff08;二&#xff09;、架构图 二、领域扩展&#xff1a;从NLP到更多场景 1. 自然语言处理&#xff08;NLP&#xff09; 2…

【SpringMVC】用户登录器项目,加法计算器项目的实现

阿华代码&#xff0c;不是逆风&#xff0c;就是我疯 你们的点赞收藏是我前进最大的动力&#xff01;&#xff01; 希望本文内容能够帮助到你&#xff01;&#xff01; 目录 一&#xff1a;用户登录项目实现 1&#xff1a;需求 2&#xff1a;准备工作 &#xff08;1&#xf…