某集团GIF动态验证码识别

注意，本文只提供学习的思路，严禁违反法律以及破坏信息系统等行为，本文只提供思路
如有侵犯，请联系作者下架

本文识别已同步上线至OCR识别网站： http://yxlocr.nat300.top/ocr/other/16

最近某集团更新了验证码，采用gif验证码，部分数据集展示如下
请添加图片描述

该验证码由固定的五位数字字母或者10以内的计算题组成，gif验证码分很多种形式，有的是通过滑动帧来展示，该验证码则是采用帧隐藏部分验证码，这种验证码虽然每一帧都包含了不同的验证码，但其实处理起来也是非常简单，直接采用帧叠加的方式即可，首先来看下成品展示，使用本文所提供的方法，准确率可以达到98%
在这里插入图片描述

不管使用pillow还是opencv都可以实现帧叠加的效果，下面我将采用base64编码转成cv2来实现帧叠加的效果

import imageio
import cv2
import numpy as npgif_file = "output_image.gif"
reader = imageio.get_reader(gif_file)
frames = [frame for frame in reader]# 将 RGB 转换为 BGR
frames_bgr = [cv2.cvtColor(frame, cv2.COLOR_RGB2BGR) for frame in frames]# 获取第一帧的尺寸
frame_height, frame_width, _ = frames_bgr[0].shape# 创建空白画布，初始化为黑色
merged_image = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)# 遍历所有帧，逐帧叠加
for frame in frames_bgr:# 将当前帧叠加到画布上# 使用 cv2.addWeighted 来实现透明叠加效果merged_image = cv2.addWeighted(merged_image, 0.8, frame, 0.2, 0)# 保存合并后的图像
cv2.imwrite(f"1.png", merged_image)

我们批量叠加来看下效果
在这里插入图片描述
可以看到，依然还是很难分辨的，因为我们的帧叠加是不做任何图像增强的操作，当然也可能跟我的透明阈值有关，但其实做到这个程度就够了，只是人眼在标注上很难区分，但模型只要通过训练，对于这些验证码还是小case的，这种验证码放在业界其实也就是入门级别的训练了，这里我们也不造轮子，直接使用paddleocr进行训练，这里我使用paddleocrv4进行训练，v4是当前PaddleOCR最新版本，首先我们需要标注数据，标注完之后编写paddleocr需要的标签文件如下：
在这里插入图片描述
上述图片中，我将带框的验证码，也就是错误的验证码（这部分实际上是脏数据）标记为-来处理，总共大概准备几千张即可（你也可以增量训练）

随后只用en_PP-OCRv4_rec.yml配置文件来进行模型的训练，模型参数配置默认即可，epoch稍微增加些
在这里插入图片描述
我这里采用双卡进行训练学习

python3 -m paddle.distributed.launch --gpus '0,,1' tools/train.py -c configs/rec/PP-OCRv4/en_PP-OCRv4_rec.yml -o Global.pretrained_model=./pretrain_models/en_PP-OCRv4_rec_train/best_accuracy

在这里插入图片描述
训练结束后，可以将其导出为inference模型

python3 tools/export_model.py -c configs/rec/PP-OCRv4/en_PP-OCRv4_rec.yml -o Global.pretrained_model=./output/rec_ppocr_gif/best_accuracy Global.save_inference_dir=./inference/en_PP-OCRv4_rec/

生成后模型目录如下：
在这里插入图片描述

在该目录下生成inference模型后，如果想脱离paddle环境，还可以再导出为onnx模型，安装paddle2onnx后即可讲inference模型转换为onnx

paddle2onnx --model_dir ./inference/en_PP-OCRv4_rec  --model_filename inference.pdmodel  --params_filename inference.pdiparams  --save_file ./inference/en_PP-OCRv4_rec/model.onnx --opset_version 11  --enable_onnx_checker True

转换后，可以看到在同级目录有个model.onnx
在这里插入图片描述
随后，我们可以使用onnx模型进行，由于脱离了paddle环境，图像的预处理是需要我们手动完成的

def ocr_predict(img_reg):chars = '0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!"#$%&\'()*+,-./ 'img_size = (48, 320, 3)imgH, imgW, imgC = img_size[:3]max_wh_ratio = imgW / imgHimgW = int((imgH * max_wh_ratio))h, w = img_reg.shape[:2]ratio = w / float(h)if math.ceil(imgH * ratio) > imgW:resized_w = imgWelse:resized_w = int(math.ceil(imgH * ratio))# recg_scr = cv2.resize(img_reg, (img_size[1],img_size[0]))resized_image = cv2.resize(img_reg, (resized_w, imgH))resized_image = resized_image.astype("float32")resized_image = resized_image.transpose((2, 0, 1)) / 255resized_image -= 0.5resized_image /= 0.5padding_im = np.zeros((3, 48, 320), dtype=np.float32)padding_im[:, :, 0:resized_w] = resized_imagepadding_im = np.expand_dims(padding_im, axis=0)recg_scr = np.asarray(padding_im, dtype=np.float32)# image = Image.open(dir_name + file)# image = valid_transform(image)# image = image.unsqueeze(0)# ort_inputs = {ort_session.get_inputs()[0].name}# output = ort_session.run(None, ort_inputs)inname_session = [input.name for input in ort_session.get_inputs()]outname_session = [output.name for output in ort_session.get_outputs()]out_put = ort_session.run(outname_session, {inname_session[0]: recg_scr})preds = np.argmax(out_put[0], axis=2)# preds_idx = preds.argmax(axis=2)preds_prob = np.max(out_put[0], axis=2)# print(preds)preds = preds[0]plate = ""last_babel = -1idx = []for i in range(len(preds)):if preds[i] != 0 and preds[i] != last_babel:plate += chars[int(preds[i]) - 1]idx.append(i)last_babel = preds[i]plate_conf = []for i in range(len(idx)):plate_conf.append(preds_prob[0][idx[i]])return plate