目录
- 1. 数据集介绍
- 2. 数据分析
- 3. 数据处理与封装
- 3.1 数据集划分
- 3.2 将数据转为tensor张量
- 3.3 数据封装
- 4. 模型训练
- 4.1 定义功能函数
- 4.1 resnet18模型
- 4.3 CNN模型
- 4.4 FCNN模型
- 5. 结果分析
- 5.1 混淆矩阵
- 5.2 查看错误分类的样本
- 6. 加载最佳模型
- 7. 参考文献
本次手写数字识别使用了resnet18(比resnet50精度更好)、CNN和FCNN三种模型,精度上resnet18 > CNN > FCNN
,最终提交到官网的测试集精度为0.983,排名为758(提交时间:2024年9月1日)。数据集、代码、python虚拟环境和训练后的最佳模型已打包上传到Gitee,[点击直达]。
1. 数据集介绍
竞赛使用的是 MNIST (Modified National Institute of Standards and Technology, 美国国家标准与技术研究院修改版) 手写图像数据集,其中训练集42000
条,测试集28000
条,每条数据有784 个像素点,即原始图像的像素为 28 * 28。训练集中的Label
列表示手写数字的类别(共10个类别,0-10)。
2. 数据分析
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_splittrain = pd.read_csv("D:/Desktop/kaggle数据集/digit-recognizer/train.csv")
test = pd.read_csv("D:/Desktop/kaggle数据集/digit-recognizer/test.csv")
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42000 entries, 0 to 41999
Columns: 785 entries, label to pixel783
dtypes: int64(785)
memory usage: 251.5 MB
test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28000 entries, 0 to 27999
Columns: 784 entries, pixel0 to pixel783
dtypes: int64(784)
memory usage: 167.5 MB
查看空缺值
#--------------------------------------------------------------------------------------------------------------------------------#
# train_data.isnull(): 返回一个与 train_data 相同维度的布尔值数据框,其中 True 表示该位置存在缺失值,False 表示没有缺失值
# any(): 对每一列进行操作,如果某列中存在至少一个 True 那么这一列的结果就是 True;否则就是 False,结果是一个布尔类型的 Series
# describe(): 统计摘要,包括总列数、唯一值个数、最频繁出现的值(top)及其出现频率(freq)
#--------------------------------------------------------------------------------------------------------------------------------#
train.isnull().any().describe()
count 785
unique 1
top False
freq 785
dtype: object
由结果可知,仅有一个唯一值False
,且出现785次,故训练集中无缺失值。
查看类别统计
sns.countplot(x=train['label']);
3. 数据处理与封装
3.1 数据集划分
将训练集划分为训练集和验证集。
# 分割特征和标签
train_labels = train["label"]
train= train.drop(labels=["label"], axis=1)# 划分训练集和验证集
X_train, X_val, y_train, y_val = train_test_split(train, train_labels, test_size = 0.2, random_state=41)
print("训练集大小:{},验证集大小:{}".format(len(X_train), len(X_val)))
训练集大小:33600,验证集大小:8400
3.2 将数据转为tensor张量
dataFrame
和Series
类型需要先转为numpy
类型,
import torch
from torch.utils.data import DataLoader, TensorDatasetX_train_tensor = torch.tensor(X_train.values, dtype = torch.float32)
y_train_tensor = torch.tensor(y_train.values)X_val_tensor = torch.tensor(X_val.values, dtype = torch.float32)
y_val_tensor = torch.tensor(y_val.values)test_tensor = torch.tensor(test.values, dtype = torch.float32)
3.3 数据封装
使用TensorDataset
创建创建包含数据特征和数据类别的tensor数据集,再用DataLoader
划分封装数据集。封装数据集时,训练集中的shuffle参数设置为True
(随机打乱数据),可以防止模型学习到数据的顺序,从而提高模型的泛化能力;验证集和测试集shuffle参数设置为False
,能够保证测试集预测结果的一致性和可比性。
train_tensor = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_tensor, batch_size=100, shuffle=True)val_tensor = TensorDataset(X_val_tensor, y_val_tensor)
val_loader = DataLoader(val_tensor, batch_size=100, shuffle=False)test_loader = DataLoader(test_tensor, batch_size = 100, shuffle=False)
可视化训练集中的一张图像
plt.imshow(train.values[10].reshape(28,28), cmap='gray')
plt.axis("off")
plt.title(str(train_labels.values[10]));
4. 模型训练
4.1 定义功能函数
定义模型的训练和验证函数:
"""
模型训练函数
Params:epoch: 训练轮次model: 预定义模型dataloader: 批处理数据criterion: 损失函数(交叉熵)optimizer: 优化器
Returnsrunning_loss/len(train_loader):本轮次(遍历一遍训练集)的平均损失sum_correct/train_num:本轮次(遍历一遍训练集)准确率
"""
def model_train(epoch, model, model_name, dataloader, criterion, optimizer):
# print("-------------------------Training-------------------------")# 设置模型为训练模式model.train()running_loss = 0.0# 训练集大小train_num = len(X_train)#记录遍历一轮数据集后分类正确的样本数sum_correct = 0for step, data in enumerate(dataloader):images, labels = dataif model_name == 'resnet18':#-------------------------------------------------------------------------------------------------## ResNet18 期望输入的形状为 [batch_size, channels, height, width],其中 channels 为 3(RGB 图像)# expand(): 沿指定维度扩展张量(但不复制数据,只改变视图)#-------------------------------------------------------------------------------------------------#images = images.view(-1, 1, 28, 28).expand(-1, 3, -1, -1)if model_name == 'cnn':# 自定义CNN的输入维度为 1images = images.view(-1, 1, 28, 28)# 模型为FCNN时无需转换images = images.to(device)labels = labels.to(device)# 清除上一次迭代的梯度信息,防止梯度累积optimizer.zero_grad()#-------------------------------------------------------------------------------------------------## outputs的尺寸[每次输入的样本数(batch_size), 类别数]# 表示的含义:对应样本被分为某一类别的概率#-------------------------------------------------------------------------------------------------#outputs = model(images)# 计算损失值loss = criterion(outputs, labels)#-------------------------------------------------------------------------------------------------## 计算损失函数相对于模型参数的梯度,并将这些梯度存储在每个参数的 .grad 属性中。# 随后,优化器会使用这些梯度来更新模型参数,从而逐步最小化损失函数,实现模型的训练#-------------------------------------------------------------------------------------------------#loss.backward()# 使用优化器 optimizer 更新模型参数optimizer.step()running_loss += loss.item()#-------------------------------------------------------------------------------------------------## torch.max()函数返回两个值:每行的最大值和最大值的索引# _:表示忽略了第一个返回值(每行的最大值)# 1:寻找每行的最大值和索引#-------------------------------------------------------------------------------------------------#_, predicted = torch.max(outputs, 1)#-------------------------------------------------------------------------------------------------## sum(): 将布尔张量转换为整数张量并对其进行求和,得到正确预测的总数。# 布尔值 True 计算为 1,False 计算为 0。# item(): 将单元素张量转换为 Python 标量值,便于计算#-------------------------------------------------------------------------------------------------#correct = (predicted == labels).sum().item()sum_correct += correcttrain_acc = correct / len(labels)
# print("[Epoch {}, step: {}] Train Loss: {:.4f}, Train Acc: {:.2f}%".format(epoch + 1, step+1, loss, train_acc*100))# print("-------------------------Training-------------------------")return running_loss/len(train_loader), sum_correct/train_num"""
模型评估函数
Params:epoch: 训练轮次model: 预定义模型dataloader: 批处理数据criterion: 损失函数(交叉熵)
Returnsrunning_loss/len(train_loader):本轮次(遍历一遍验证集)的平均损失sum_correct/train_num:本轮次(遍历一遍验证集)准确率
"""
def model_validate(epoch, model, model_name, dataloader, criterion):
# print("------------------------Validating------------------------")# 设置模型为测试模式model.eval()val_loss = 0.0# 训练集大小val_num = len(X_val)sum_correct = 0# 禁止梯度反传with torch.no_grad():for step, data in enumerate(dataloader):images, labels = dataif model_name == 'resnet18':#-------------------------------------------------------------------------------------------------## ResNet18 期望输入的形状为 [batch_size, channels, height, width],其中 channels 为 3(RGB 图像)# expand(): 沿指定维度扩展张量(但不复制数据,只改变视图)#-------------------------------------------------------------------------------------------------#images = images.view(-1, 1, 28, 28).expand(-1, 3, -1, -1)if model_name == 'cnn':# 自定义CNN的输入维度为 1images = images.view(-1, 1, 28, 28)# 模型为FCNN时无需转换images = images.to(device)labels = labels.to(device)outputs = model(images)# 计算损失值loss = criterion(outputs, labels)val_loss += loss.item()#-------------------------------------------------------------------------------------------------## torch.max()函数返回两个值:每行的最大值和最大值的索引# _:表示忽略了第一个返回值(每行的最大值)# 1:寻找每行的最大值和索引#-------------------------------------------------------------------------------------------------#_, predicted = torch.max(outputs, 1)correct = (predicted == labels).sum().item()sum_correct += correcttotal = len(labels)val_acc = correct / total
# print("[Epoch {}, step: {}] Val Loss: {:.4f}, Val Acc: {:.2f}%".format(epoch + 1, step+1, loss, val_acc*100))# print("------------------------Validating------------------------")return val_loss/len(train_loader), sum_correct/val_num
定义模型训练与验证的综合函数:
import torch.nn as nn
import torch.optim as optim
"""
模型整体训练与验证函数
Params:model: 预定义模型
"""
def train_val(model, model_name):# 定义损失函数(交叉熵)和优化器criterion = nn.CrossEntropyLoss()optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)# 验证集上的最佳准确率和最佳轮次best_val_acc = 0.0best_epoch = 0 for epoch in range(10): # 模型训练train_loss, train_acc = model_train(epoch, model, model_name, train_loader, criterion, optimizer)train_losses.append(train_loss)train_accuracies.append(train_acc)# 模型验证val_loss, val_acc = model_validate(epoch, model, model_name, val_loader, criterion)val_losses.append(val_loss)val_accuracies.append(val_acc)if val_acc > best_val_acc:best_val_acc = val_accbest_epoch = epoch + 1torch.save(model.state_dict(), model_name+'_best_model.pth') print("[第{}轮训练完成,训练集中 Loss:{},Accuracy:{}]".format(epoch+1, train_loss, train_acc))print("训练完成!最佳训练轮次:{},该轮次验证集上的准确率:{}".format(best_epoch, best_val_acc))
定义损失值和准确率的可视化函数:
"""
可视化损失值和准确率
"""
def loss_acc_plot(train_losses, val_losses, train_accuracies, val_accuracies):plt.figure(figsize=(12, 4))plt.subplot(1, 2, 1)# 默认情况下,plt.plot 会将 train_losses 的索引作为 X 轴的值plt.plot(train_losses, label='Train Loss')plt.plot(val_losses, label='Validation Loss')plt.xlabel('Epoch')plt.ylabel('Loss')plt.legend()plt.subplot(1, 2, 2)plt.plot(train_accuracies, label='Train Accuracy')plt.plot(val_accuracies, label='Validation Accuracy')plt.xlabel('Epoch')plt.ylabel('Accuracy')plt.legend()plt.tight_layout()
4.1 resnet18模型
from torchvision import models
# 使用GPU训练模型(如果GPU可用的话)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")# 调用resnet18
resnet_model = models.resnet18()
resnet_model = resnet_model.to(device)# 记录训练集和验证集的损失值和准确率
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []train_val(resnet_model, "resnet18")
[第1轮训练完成,训练集中 Loss:0.48536758923104834,Accuracy:0.9060714285714285]
[第2轮训练完成,训练集中 Loss:0.05270050720095502,Accuracy:0.985]
[第3轮训练完成,训练集中 Loss:0.02555189496238849,Accuracy:0.9938392857142857]
[第4轮训练完成,训练集中 Loss:0.015233770400560129,Accuracy:0.9965773809523809]
[第5轮训练完成,训练集中 Loss:0.007979269749263213,Accuracy:0.9988690476190476]
[第6轮训练完成,训练集中 Loss:0.005160370017706771,Accuracy:0.9996428571428572]
[第7轮训练完成,训练集中 Loss:0.0035936778385803336,Accuracy:0.9998511904761904]
[第8轮训练完成,训练集中 Loss:0.0028507261213235324,Accuracy:0.9999107142857143]
[第9轮训练完成,训练集中 Loss:0.002293311161511589,Accuracy:0.9998809523809524]
[第10轮训练完成,训练集中 Loss:0.0019566422187857653,Accuracy:0.9998511904761904]
训练完成!最佳训练轮次:6,该轮次验证集上的准确率:0.9858333333333333
可视化损失值和准确率:
loss_acc_plot(train_losses, val_losses, train_accuracies, val_accuracies)
4.3 CNN模型
定义CNN模型结构:
class CNNModel(nn.Module):def __init__(self):super(CNNModel, self).__init__()# 卷积层 1self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)self.relu1 = nn.ReLU()# 最大池化层 1self.maxpool1 = nn.MaxPool2d(kernel_size=2)# 卷积层 2self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)self.relu2 = nn.ReLU() # 最大池化层 2self.maxpool2 = nn.MaxPool2d(kernel_size=2)# 全连接层self.fc1 = nn.Linear(32 * 4 * 4, 10) def forward(self, x):# 卷积层 1out = self.cnn1(x)out = self.relu1(out)#最大池化层 1out = self.maxpool1(out)# 卷积层 2 out = self.cnn2(out)out = self.relu2(out)# 最大池化层 2 out = self.maxpool2(out)# flatten层out = out.view(out.size(0), -1)# 全连接层out = self.fc1(out)return out
网络结构可视化:
上述结构图中,双层黄色块的第一层表示卷积操作,第二层表示ReLU()
激活操作,红色块表示最大池化操作。16
表示卷积操作后输出的的通道数,784表示卷积操作后输出的图像大小(宽度*高度)。
训练模型:
cnn_model = CNNModel()
cnn_model = cnn_model.to(device)# 记录训练集和验证集的损失值和准确率
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []train_val(cnn_model, "cnn")
[第1轮训练完成,训练集中 Loss:1.004740373009727,Accuracy:0.8109226190476191]
[第2轮训练完成,训练集中 Loss:0.17581947764293068,Accuracy:0.945625]
[第3轮训练完成,训练集中 Loss:0.13953285299551985,Accuracy:0.9569345238095238]
[第4轮训练完成,训练集中 Loss:0.12599757561526662,Accuracy:0.9607738095238095]
[第5轮训练完成,训练集中 Loss:0.11612535938842311,Accuracy:0.9638095238095238]
[第6轮训练完成,训练集中 Loss:0.10294126443720113,Accuracy:0.9671726190476191]
[第7轮训练完成,训练集中 Loss:0.09651396153051228,Accuracy:0.9698214285714286]
[第8轮训练完成,训练集中 Loss:0.09004475945229864,Accuracy:0.9717857142857143]
[第9轮训练完成,训练集中 Loss:0.08583687311537298,Accuracy:0.9727083333333333]
[第10轮训练完成,训练集中 Loss:0.08039018868779142,Accuracy:0.9748214285714286]
训练完成!最佳训练轮次:9,该轮次验证集上的准确率:0.9721428571428572
可视化损失值和准确率:
loss_acc_plot(train_losses, val_losses, train_accuracies, val_accuracies)
4.4 FCNN模型
定义FCNN模型结构:
class FCNNModel(nn.Module): def __init__(self, input_dim, hidden_dim, output_dim):super(FCNNModel, self).__init__()# 784 --> 150self.fc1 = nn.Linear(input_dim, hidden_dim) # 激活函数self.relu1 = nn.ReLU()# 150 --> 150self.fc2 = nn.Linear(hidden_dim, hidden_dim)# 激活函数self.tanh2 = nn.Tanh()# 150 --> 150self.fc3 = nn.Linear(hidden_dim, hidden_dim)# 激活函数self.elu3 = nn.ELU()# 150 --> 10self.fc4 = nn.Linear(hidden_dim, output_dim) def forward(self, x):# 784 --> 150out = self.fc1(x)out = self.relu1(out)# 150 --> 150out = self.fc2(out)out = self.tanh2(out)# 150 --> 150out = self.fc3(out)out = self.elu3(out)# 150 --> 10out = self.fc4(out)return out
模型训练:
# 记录训练集和验证集的损失值和准确率
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []input_dim = 28*28
# 可微调
hidden_dim = 150
output_dim = 10fcnn_model = FCNNModel(input_dim, hidden_dim, output_dim)
fcnn_model = fcnn_model.to(device)
train_val(fcnn_model, "fcnn")
[第1轮训练完成,训练集中 Loss:0.8977700323753414,Accuracy:0.7811904761904762]
[第2轮训练完成,训练集中 Loss:0.3077347204089165,Accuracy:0.9172916666666666]
[第3轮训练完成,训练集中 Loss:0.2244828560034789,Accuracy:0.9372619047619047]
[第4轮训练完成,训练集中 Loss:0.18338089338725522,Accuracy:0.9476488095238095]
[第5轮训练完成,训练集中 Loss:0.15651956990006424,Accuracy:0.9541071428571428]
[第6轮训练完成,训练集中 Loss:0.1355396158483234,Accuracy:0.9603869047619048]
[第7轮训练完成,训练集中 Loss:0.11753073033122789,Accuracy:0.965625]
[第8轮训练完成,训练集中 Loss:0.10319345946135443,Accuracy:0.9705059523809524]
[第9轮训练完成,训练集中 Loss:0.09024346410296857,Accuracy:0.974047619047619]
[第10轮训练完成,训练集中 Loss:0.07875394061695606,Accuracy:0.9776785714285714]
训练完成!最佳训练轮次:10,该轮次验证集上的准确率:0.963452380952381
可视化损失值和准确率:
loss_acc_plot(train_losses, val_losses, train_accuracies, val_accuracies)
5. 结果分析
5.1 混淆矩阵
计算混淆矩阵,
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(all_labels, all_predictions)
plt.figure(figsize=(5, 5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", cbar=False, xticklabels=range(10), yticklabels=range(10))
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix");
横轴为预测类别,纵轴为实际类别。对标线上的值表示模型正确预测的样本数量,非对角线上的值表示模型错误预测的样本数量。对角线(1, 1)中的值900表示实际类别为1的样本中有900条被正确预测为1;(1, 4)中的值为1表示实际类别为1的样本中有1个样本被错误预测为4。
5.2 查看错误分类的样本
incorrect_images = []
incorrect_labels = []
predicted_labels = []with torch.no_grad():for step, data in enumerate(val_loader):images, labels = dataimages = images.view(-1, 1, 28, 28)outputs = best_resnet_model(images.expand(-1, 3, -1, -1))_, predicted = torch.max(outputs, 1)for i in range(len(predicted)):if predicted[i] != labels[i]:incorrect_images.append(images[i])incorrect_labels.append(labels[i])predicted_labels.append(predicted[i])# 展示部分预测错误的样本
num_samples = 6
fig, axes = plt.subplots(nrows=2, ncols=num_samples // 2, figsize=(10, 6))
axes = axes.flatten()for i in range(num_samples):ax = axes[i]img = incorrect_images[i].reshape(28,28)ax.imshow(img, cmap='gray')ax.set_title(f"True: {incorrect_labels[i]}, Pred: {predicted_labels[i]}")ax.axis('off')
6. 加载最佳模型
保存的最佳模型中,resnet18
、CNN
和FCNN
在验证集中的准确率分别为98.58%
、97.21%
和96.35%
,因此选择resnet18模型来预测测试集。
best_model = models.resnet18()
best_model.load_state_dict(torch.load("./resnet18_best_model.pth"))
predictions = []
with torch.no_grad():for data in test_loader:images = data.view(-1, 1, 28, 28).expand(-1, 3, -1, -1)outputs = best_model(images)_, predicted = torch.max(outputs, 1)predictions.extend(predicted.numpy())
# 保存预测结果
submission = pd.DataFrame({'ImageId': range(1, len(test) + 1), 'Label': predictions})
# submission.to_csv('/kaggle/working/submission.csv', index=False)
print('Submission file created!')
7. 参考文献
[1] kaggle:Digit Recognizer《手写数字识别》你的第一个图像识别竞赛项目
[2] Pytorch Tutorial for Deep Learning Lovers