1.从数据流水线、模型、损失函数、小批量随机梯度下降优化器
%matplotlib inline
import random
import torch
from d2l import torch as d2l
2.根据带有噪声的线性模型构造人造数据集。使用线性模型参数w = [2,-3.4]T、b = 4.2和噪声项ε生成数据集及标签
y = Xw + b + ε
def synthetic_data(w, b, num_examples):"""生成 y = Xw + b + 噪声。"""X = torch.normal(0, 1, (num_examples, len(w)))y = torch.matmul(X, w) + by += torch.normal(0, 0.01, y.shape)return X, y.reshape((-1, 1))true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)
3.features每行都包含二维数据样本,labels每行都包含一维标签值(标量)
print('features:', features[0], '\nlabel:', labels[0])
d2l.set_figsize()
d2l.plt.scatter(features[:, (1)].detach().numpy(),labels.detach().numpy(),1);
4.定义data_iter函数,该函数接收批量大小、特征矩阵、标签向量作为输入,生成大小为batch_size的小批量
def data_iter(batch_size, features, labels):num_examples = len(features)indices = list(range(num_examples))random.shuffle(indices)for i in range(0, num_examples, batch_size):batch_indices = torch.tensor(indices[i:min(i + batch_size, num_examples)])yield features[batch_indices], labels[batch_indices]batch_size = 10for X, y in data_iter(batch_size, features, labels):print(X, '\n', y)break
5.定义初始化模型参数
w = torch.normal(0, 0.01, size = (2, 1),requires_grab = True)
b = torch.zeros(1, requires_grab = True)
6.定义模型
def linreg(X, w, b):"""线性回归模型"""return torch.matmul(X, w) + b
7.定义损失函数
def squared_loss(y_hat, y):"""均方损失。"""return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2
8.定义优化算法
def sgd(params, lr, batch_size):"""小批量随机梯度下降。"""with torch.no_grad():for param in params:param -= lr * param.grad / batch_sizeparam.grad.zero_()
9.训练过程
lr = 0.03
num_epochs = 3
net = linreg
loss = squared_lossfor epoch in range(num_epochs):for X, y in data_iter(batch_size, features, labels):l = loss(net(X, w, b), y)l.sum().backward()sgd([w, b], lr, batch_size)with torch.no_grad():train_1 = loss(net(features, w, b), labels)print(f'epoch {epoch + 1}, loss {float(train_1.mean()):f}')
增加训练epoch,将num_epochs提高到10,观察损失值变化情况。
lr = 0.03
num_epochs = 10
net = linreg
loss = squared_lossfor epoch in range(num_epochs):for X, y in data_iter(batch_size, features, labels):l = loss(net(X, w, b), y)l.sum().backward()sgd([w, b], lr, batch_size)with torch.no_grad():train_1 = loss(net(features, w, b), labels)print(f'epoch {epoch + 1}, loss {float(train_1.mean()):f}')
增大学习率(lr),将lr调高到10,看看损失值是否有显著变化。如果学习率太小,模型参数更新不明显,会导致损失值几乎不变。
lr = 10
num_epochs = 10
net = linreg
loss = squared_lossfor epoch in range(num_epochs):for X, y in data_iter(batch_size, features, labels):l = loss(net(X, w, b), y)l.sum().backward()sgd([w, b], lr, batch_size)with torch.no_grad():train_1 = loss(net(features, w, b), labels)print(f'epoch {epoch + 1}, loss {float(train_1.mean()):f}')
10. 学习率过高分析
训练过程中每个 epoch 的损失值变成了 NaN(Not a Number)。这种情况通常是因为学习率过高,导致了梯度爆炸,使得参数更新变得不稳定,从而产生了 NaN。
当学习率过高时,每次参数更新的步长就会非常大,这可能会导致模型参数变得异常大,从而使损失计算结果出现溢出或 NaN。
在进行反向传播时,由于梯度过大,模型的更新会导致权重变得极端,从而使损失无法正常计算。如果损失函数中有平方或指数运算,那么过高的学习率会导致梯度变得异常大,进一步放大参数更新的幅度,从而导致模型训练不稳定。