目录
- 摘要
- Abstract
- 1. LSTM原理
- 2. LSTM反向传播的数学推导
- 3. LSTM模型训练实战
- 总结
摘要
本周的学习内容是对LSTM相关内容的复习,LSTM被设计用来解决标准RNN在处理长序列数据时遇到的梯度消失和梯度爆炸问题。LSTM通过引入门控机制来控制信息的流动,从而有效地缓解了梯度消失问题,这使得LSTM在各种时序数据处理场景中有更为优秀的表现。通过使用LSTM模型预测股票市场数据的涨幅趋势和对LSTM反向传播的数学推导加深对LSTM的理解。
Abstract
This week’s learning content is a review of LSTM related content. LSTM were designed to address the vanishing and exploding gradient problems that standard RNN encounter when dealing with long sequence data. By introducing gating mechanisms to control the flow of information, LSTM effectively alleviates the vanishing gradient issue, making them perform more excellently in various time series data processing scenarios. The understanding of LSTM is deepened by using LSTM models to predict the trend of stock market data increases and by mathematically deriving the backpropagation of LSTM.
1. LSTM原理
LSTM结构如下图所示:
LSTM有3个gate:
(1)Input Gate:控制data输入储存单元,外界neural的输出想要写入存储单元时要经过它。
(2)Output Gate:控制data从存储单元中输出,外界neural想要从存储单元中读出值需要经过它。
(3)Forget Gate:决定什么时候要把存储单元中的data清除。
这三个Gate的开关都是神经网络自己学习的,它可以自己学习什么时候开门,什么时候关门。综上,整个LSTM有4个输入,1个输出。
将上述模型进一步细化,如上图所示。假定
我们要存入单元的输入为z,
控制input Gate的信号为zi,
控制Output Gate的信号为zo,
控制Forget Gate的信号为zf,
把z输入通过激活函数得到g(z),zi输入通过激活函数得到f(zi),这里用到的激活函数通常都为sigmoid函数,因为经过sigmoid函数所得值介于0和1之间,可以以此判断门是打开还是关闭的,把g(z)乘上f(zi)得到g(z)f(zi)。
zf通过激活函数得到f(zf),c’=g(z)f(zi)+cf(zf)。c’为重新存入单元的值,c’经sigmoid函数得到h(c‘)。最后由h(c‘)乘上f(zo)得到最终的输出a。输出门受f(zo) 所操控f(zo)等于 1 的话,就说明 h(c′) 能通过,f(zo) 等于 0 的话,说明记忆元里面存在的值没有办法通过输出门被读取出来。其他gate的处理与此类似。
2. LSTM反向传播的数学推导
LSTM前向传播的示意图如下所示:
在做反向传播的过程中我们发现,当时间跨度越大时,Loss与可调控参数W间的路径就越多越复杂,路径越多连乘的项也就越多,累乘项多就会带来更多的梯度消失的可能。我们可以通过调整可调控参数的大小从而抵消在传递过程种一些连乘项对模型的影响,从而降低了梯度消失的可能。
3. LSTM模型训练实战
本次实训通过Python的Keras库构建LSTM模型,旨在预测时间序列中的步骤和序列在股票市场数据的应用。
实验过程如下:
模型用到的库有Keras 、NumPy 、Matplotlib等
导入代码如下所示:
import numpy as np
import datetime as dt
from numpy import newaxis
from core.utils import Timer
from keras.layers import Dense, Activation, Dropout, LSTM
from keras.models import Sequential, load_model
from keras.callbacks import EarlyStopping, ModelCheckpoint
import matplotlib.pyplot as plt
3.1 数据处理
DataLoader 是一个数据处理工具,能够从 CSV 文件中读取数据,生成训练和测试数据窗口,并在必要时对数据进行归一化。这些功能为 LSTM 模型的训练和评估提供了便利,确保模型可以处理时间序列数据。代码如下所示:
class DataLoader():"""A class for loading and transforming data for the lstm model"""def __init__(self, filename, split, cols):dataframe = pd.read_csv(filename)i_split = int(len(dataframe) * split)self.data_train = dataframe.get(cols).values[:i_split]self.data_test = dataframe.get(cols).values[i_split:]self.len_train = len(self.data_train)self.len_test = len(self.data_test)self.len_train_windows = Nonedef get_test_data(self, seq_len, normalise):'''Create x, y test data windowsWarning: batch method, not generative, make sure you have enough memory toload data, otherwise reduce size of the training split.'''data_windows = []for i in range(self.len_test - seq_len):data_windows.append(self.data_test[i:i+seq_len])data_windows = np.array(data_windows).astype(float)data_windows = self.normalise_windows(data_windows, single_window=False) if normalise else data_windowsx = data_windows[:, :-1]y = data_windows[:, -1, [0]]return x,ydef get_train_data(self, seq_len, normalise):'''Create x, y train data windowsWarning: batch method, not generative, make sure you have enough memory toload data, otherwise use generate_training_window() method.'''data_x = []data_y = []for i in range(self.len_train - seq_len):x, y = self._next_window(i, seq_len, normalise)data_x.append(x)data_y.append(y)return np.array(data_x), np.array(data_y)def generate_train_batch(self, seq_len, batch_size, normalise):'''Yield a generator of training data from filename on given list of cols split for train/test'''i = 0while i < (self.len_train - seq_len):x_batch = []y_batch = []for b in range(batch_size):if i >= (self.len_train - seq_len):# stop-condition for a smaller final batch if data doesn't divide evenlyyield np.array(x_batch), np.array(y_batch)i = 0x, y = self._next_window(i, seq_len, normalise)x_batch.append(x)y_batch.append(y)i += 1yield np.array(x_batch), np.array(y_batch)def _next_window(self, i, seq_len, normalise):'''Generates the next data window from the given index location i'''window = self.data_train[i:i+seq_len]window = self.normalise_windows(window, single_window=True)[0] if normalise else windowx = window[:-1]y = window[-1, [0]]return x, ydef normalise_windows(self, window_data, single_window=False):'''Normalise window with a base value of zero'''normalised_data = []window_data = [window_data] if single_window else window_datafor window in window_data:normalised_window = []for col_i in range(window.shape[1]):normalised_col = [((float(p) / float(window[0, col_i])) - 1) for p in window[:, col_i]]normalised_window.append(normalised_col)normalised_window = np.array(normalised_window).T # reshape and transpose array back into original multidimensional formatnormalised_data.append(normalised_window)return np.array(normalised_data)
3.2 LSTM模型的搭建
Model 类封装了构建、训练和预测 LSTM 模型的功能,支持多种训练方式和预测方法。它能够处理不同层的添加、模型的编译和训练,并提供了灵活的预测方法,适用于时间序列数据的建模。
代码如下所示:
class Model():"""A class for an building and inferencing an lstm model"""def __init__(self):self.model = Sequential()def load_model(self, filepath):print('[Model] Loading model from file %s' % filepath)self.model = load_model(filepath)def build_model(self, configs):timer = Timer()timer.start()for layer in configs['model']['layers']:neurons = layer['neurons'] if 'neurons' in layer else Nonedropout_rate = layer['rate'] if 'rate' in layer else Noneactivation = layer['activation'] if 'activation' in layer else Nonereturn_seq = layer['return_seq'] if 'return_seq' in layer else Noneinput_timesteps = layer['input_timesteps'] if 'input_timesteps' in layer else Noneinput_dim = layer['input_dim'] if 'input_dim' in layer else Noneif layer['type'] == 'dense':self.model.add(Dense(neurons, activation=activation))if layer['type'] == 'lstm':self.model.add(LSTM(neurons, input_shape=(input_timesteps, input_dim), return_sequences=return_seq))if layer['type'] == 'dropout':self.model.add(Dropout(dropout_rate))self.model.compile(loss=configs['model']['loss'], optimizer=configs['model']['optimizer'])print('[Model] Model Compiled')timer.stop()def train(self, x, y, epochs, batch_size, save_dir):timer = Timer()timer.start()print('[Model] Training Started')print('[Model] %s epochs, %s batch size' % (epochs, batch_size))save_fname = os.path.join(save_dir, '%s-e%s.h5' % (dt.datetime.now().strftime('%d%m%Y-%H%M%S'), str(epochs)))callbacks = [EarlyStopping(monitor='val_loss', patience=2),ModelCheckpoint(filepath=save_fname, monitor='val_loss', save_best_only=True)]self.model.fit(x,y,epochs=epochs,batch_size=batch_size,callbacks=callbacks)self.model.save(save_fname)print('[Model] Training Completed. Model saved as %s' % save_fname)timer.stop()def train_generator(self, data_gen, epochs, batch_size, steps_per_epoch, save_dir):timer = Timer()timer.start()print('[Model] Training Started')print('[Model] %s epochs, %s batch size, %s batches per epoch' % (epochs, batch_size, steps_per_epoch))save_fname = os.path.join(save_dir, '%s-e%s.h5' % (dt.datetime.now().strftime('%d%m%Y-%H%M%S'), str(epochs)))callbacks = [ModelCheckpoint(filepath=save_fname, monitor='loss', save_best_only=True)]self.model.fit_generator(data_gen,steps_per_epoch=steps_per_epoch,epochs=epochs,callbacks=callbacks,workers=1)print('[Model] Training Completed. Model saved as %s' % save_fname)timer.stop()def predict_point_by_point(self, data):#Predict each timestep given the last sequence of true data, in effect only predicting 1 step ahead each timeprint('[Model] Predicting Point-by-Point...')predicted = self.model.predict(data)predicted = np.reshape(predicted, (predicted.size,))return predicteddef predict_sequences_multiple(self, data, window_size, prediction_len):#Predict sequence of 50 steps before shifting prediction run forward by 50 stepsprint('[Model] Predicting Sequences Multiple...')prediction_seqs = []for i in range(int(len(data)/prediction_len)):curr_frame = data[i*prediction_len]predicted = []for j in range(prediction_len):predicted.append(self.model.predict(curr_frame[newaxis,:,:])[0,0])curr_frame = curr_frame[1:]curr_frame = np.insert(curr_frame, [window_size-2], predicted[-1], axis=0)prediction_seqs.append(predicted)return prediction_seqsdef predict_sequence_full(self, data, window_size):#Shift the window by 1 new prediction each time, re-run predictions on new windowprint('[Model] Predicting Sequences Full...')curr_frame = data[0]predicted = []for i in range(len(data)):predicted.append(self.model.predict(curr_frame[newaxis,:,:])[0,0])curr_frame = curr_frame[1:]curr_frame = np.insert(curr_frame, [window_size-2], predicted[-1], axis=0)return predicted
以其中train方法为例介绍:
def train(self, x, y, epochs, batch_size, save_dir):timer = Timer()timer.start()print('[Model] Training Started')print('[Model] %s epochs, %s batch size' % (epochs, batch_size))
train 方法接收训练数据 x 和 y,训练的轮数 epochs,批量大小 batch_size 和保存模型的目录 save_dir。
3.3 数据的预测和可视化
绘图函数plot_results_multiple:
def plot_results_multiple(predicted_data, true_data, prediction_len):fig = plt.figure(facecolor='white')ax = fig.add_subplot(111)ax.plot(true_data, label='True Data')for i, data in enumerate(predicted_data):padding = [None for p in range(i * prediction_len)]plt.plot(padding + data, label='Prediction')plt.legend()plt.show()
plot_results_multiple的功能是绘制多组预测数据与真实数据的对比图。首先绘制真实数据。对于每组预测数据,先创建填充(padding),以便将每组预测数据在图上正确对齐。通过plt.plot绘制每组预测数据,并使用图例标识。最后显示绘制的图形。
主函数main:
def main():configs = json.load(open('config.json', 'r'))if not os.path.exists(configs['model']['save_dir']): os.makedirs(configs['model']['save_dir'])data = DataLoader(os.path.join('data', configs['data']['filename']),configs['data']['train_test_split'],configs['data']['columns'])model = Model()model.build_model(configs)x, y = data.get_train_data(seq_len=configs['data']['sequence_length'],normalise=configs['data']['normalise'])
主函数,用于执行整个模型的训练与预测流程。从config.json中加载配置文件,获取模型保存目录和数据相关信息。如果保存目录不存在,则创建该目录。使用DataLoader类加载数据,包括文件路径、训练测试划分比例及需要的列。创建模型实例并构建模型。从数据集中获取训练数据(x为特征,y为标签)。
模型训练与预测
steps_per_epoch = math.ceil((data.len_train - configs['data']['sequence_length']) / configs['training']['batch_size'])
model.train_generator(data_gen=data.generate_train_batch(seq_len=configs['data']['sequence_length'],batch_size=configs['training']['batch_size'],normalise=configs['data']['normalise']),epochs=configs['training']['epochs'],batch_size=configs['training']['batch_size'],steps_per_epoch=steps_per_epoch,save_dir=configs['model']['save_dir']
)
x_test, y_test = data.get_test_data(seq_len=configs['data']['sequence_length'],normalise=configs['data']['normalise']
)predictions = model.predict_sequences_multiple(x_test, configs['data']['sequence_length'], configs['data']['sequence_length'])
plot_results_multiple(predictions, y_test, configs['data']['sequence_length'])
该部分的代码代码则是实现了基于生成器的训练方式,适用于数据量较大的情况。计算每个epoch的步骤数(steps_per_epoch),使用生成器训练模型。data.generate_train_batch生成训练数据的批次。从数据集中获取测试数据(x_test和y_test)。使用模型进行多序列预测。调用plot_results_multiple函数绘制预测结果与真实结果的对比图。
模型的训练过程如下所示:
模型的预测结果如下所示:
可以看到,预训模型预测的趋势与实际股票趋势比较吻合!
总结
通过本周的复习我对LSTM有了进一步的理解,不同于传统的神经网络,LSTM由4个input和一个Output组成,它通过引入复杂的门控机制和内部状态来增强模型的记忆能力和对序列数据的理解。通过LSTM的反向传播数学推导明白了LSTM为什么可以缓解RNN中的梯度消失问题。