
LSTM介紹
機器學習當中最常利用多層感知器(Multi-Layer Perceptron, MLP)來訓練模型,如下圖所示
[1, 2, 3]
希望輸出4
,而當輸入[3, 2, 1]
時希望輸出0
,對於MLP來說,[1, 2, 3]
和 [3, 2, 1]
是相同的,因此無法得到預期的結果。
因此有了遞歸神經網絡(Recurrent Neural Network, RNN)的出現設計如下圖所示。

input gate
、output gate
、forget gate
示意圖如下。

Stock Prediction為例
SPY dataset: Yahoo SPDR S&P 500 ETF (SPY)
Adj Close
。
資料建置
匯入套件
將pandas
、numpy
、keras
、matplotlib
匯入
1 2 3 4 5 6 7 8 9 |
import pandas as pd import numpy as np from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten, LSTM, TimeDistributed, RepeatVector from keras.layers.normalization import BatchNormalization from keras.optimizers import Adam from keras.callbacks import EarlyStopping, ModelCheckpoint import matplotlib.pyplot as plt %matplotlib inline |
讀取資料
1 2 3 |
def readTrain(): train = pd.read_csv("SPY.csv") return train |

Augment Features
除了基本資料提供的Features(Open, High, Low, Close, Adj Close, Volume)以外,還可自己增加Features,例如星期幾、幾月、幾號等等。
1 2 3 4 5 6 7 |
def augFeatures(train): train["Date"] = pd.to_datetime(train["Date"]) train["year"] = train["Date"].dt.year train["month"] = train["Date"].dt.month train["date"] = train["Date"].dt.day train["day"] = train["Date"].dt.dayofweek return train |

Normalization
將所有資料做正規化,而由於Date
是字串非數字,因此先將它drop
掉
1 2 3 4 |
def normalize(train): train = train.drop(["Date"], axis=1) train_norm = train.apply(lambda x: (x - np.mean(x)) / (np.max(x) - np.min(x))) return train_norm |

Build Training Data
輸入X_train: 利用前30天的Open, High, Low, Close, Adj Close, Volume, month, year, date, day
作為Features,shape為(30, 10)
輸出Y_train: 利用未來5天的Adj Close
作為Features,shape為(5,1)
我們須將資料做位移的展開作為Training Data,如圖(1)所示。

1 2 3 4 5 6 |
def buildTrain(train, pastDay=30, futureDay=5): X_train, Y_train = [], [] for i in range(train.shape[0]-futureDay-pastDay): X_train.append(np.array(train.iloc[i:i+pastDay])) Y_train.append(np.array(train.iloc[i+pastDay:i+pastDay+futureDay]["Adj Close"])) return np.array(X_train), np.array(Y_train) |
資料亂序
將資料打散,而非照日期排序
1 2 3 4 5 |
def shuffle(X,Y): np.random.seed(10) randomList = np.arange(X.shape[0]) np.random.shuffle(randomList) return X[randomList], Y[randomList] |
Training data & Validation data
將Training Data取一部份當作Validation Data
1 2 3 4 5 6 |
def splitData(X,Y,rate): X_train = X[int(X.shape[0]*rate):] Y_train = Y[int(Y.shape[0]*rate):] X_val = X[:int(X.shape[0]*rate)] Y_val = Y[:int(Y.shape[0]*rate)] return X_train, Y_train, X_val, Y_val |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# read SPY.csv train = readTrain() # Augment the features (year, month, date, day) train_Aug = augFeatures(train) # Normalization train_norm = normalize(train_Aug) # build Data, use last 30 days to predict next 5 days X_train, Y_train = buildTrain(train_norm, 30, 5) # shuffle the data, and random seed is 10 X_train, Y_train = shuffle(X_train, Y_train) # split training data and validation data X_train, Y_train, X_val, Y_val = splitData(X_train, Y_train, 0.1) # X_trian: (5710, 30, 10) # Y_train: (5710, 5, 1) # X_val: (634, 30, 10) # Y_val: (634, 5, 1) |
模型建置

many to one
以及many to many

一對一模型
由於是一對一模型,因此return_sequences
也可設為False
,但Y_train
以及Y_val
的維度需改為二維(5710,1)
以及(634,1)
。
1 2 3 4 5 6 7 8 |
def buildOneToOneModel(shape): model = Sequential() model.add(LSTM(10, input_length=shape[1], input_dim=shape[2],return_sequences=True)) # output shape: (1, 1) model.add(TimeDistributed(Dense(1))) # or use model.add(Dense(1)) model.compile(loss="mse", optimizer="adam") model.summary() return model |
pastDay
設為1,預測的天數futureDay
也設為1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
train = readTrain() train_Aug = augFeatures(train) train_norm = normalize(train_Aug) # change the last day and next day X_train, Y_train = buildTrain(train_norm, 1, 1) X_train, Y_train = shuffle(X_train, Y_train) X_train, Y_train, X_val, Y_val = splitData(X_train, Y_train, 0.1) # from 2 dimmension to 3 dimension Y_train = Y_train[:,np.newaxis] Y_val = Y_val[:,np.newaxis] model = buildOneToOneModel(X_train.shape) callback = EarlyStopping(monitor="loss", patience=10, verbose=1, mode="auto") model.fit(X_train, Y_train, epochs=1000, batch_size=128, validation_data=(X_val, Y_val), callbacks=[callback]) |

val_loss: 2.2902e-05
停在第164個Epoch

多對一模型
LSTM參數return_sequences=False
,未設定時default也為False,而且不可使用TimeDistribution
1 2 3 4 5 6 7 8 |
def buildManyToOneModel(shape): model = Sequential() model.add(LSTM(10, input_length=shape[1], input_dim=shape[2])) # output shape: (1, 1) model.add(Dense(1)) model.compile(loss="mse", optimizer="adam") model.summary() return model |
pastDay=30
、future=1
,且注意Y_train
的維度需為二維
1 2 3 4 5 6 7 8 9 10 11 12 |
train = readTrain() train_Aug = augFeatures(train) train_norm = normalize(train_Aug) # change the last day and next day X_train, Y_train = buildTrain(train_norm, 30, 1) X_train, Y_train = shuffle(X_train, Y_train) # because no return sequence, Y_train and Y_val shape must be 2 dimension X_train, Y_train, X_val, Y_val = splitData(X_train, Y_train, 0.1) model = buildManyToOneModel(X_train.shape) callback = EarlyStopping(monitor="loss", patience=10, verbose=1, mode="auto") model.fit(X_train, Y_train, epochs=1000, batch_size=128, validation_data=(X_val, Y_val), callbacks=[callback]) |

val_loss: 3.9465e-05
停在第113個Epoch

一對多模型
因為是一對多模型Timesteps
只有1,因此return_sequences=False
才可執行
1 2 3 4 5 6 7 8 9 |
def buildOneToManyModel(shape): model = Sequential() model.add(LSTM(10, input_length=shape[1], input_dim=shape[2])) # output shape: (5, 1) model.add(Dense(1)) model.add(RepeatVector(5)) model.compile(loss="mse", optimizer="adam") model.summary() return model |
pastDay
設為1, futureDay
設為5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
train = readTrain() train_Aug = augFeatures(train) train_norm = normalize(train_Aug) # change the last day and next day X_train, Y_train = buildTrain(train_norm, 1, 5) X_train, Y_train = shuffle(X_train, Y_train) X_train, Y_train, X_val, Y_val = splitData(X_train, Y_train, 0.1) # from 2 dimmension to 3 dimension Y_train = Y_train[:,:,np.newaxis] Y_val = Y_val[:,:,np.newaxis] model = buildOneToManyModel(X_train.shape) callback = EarlyStopping(monitor="loss", patience=10, verbose=1, mode="auto") model.fit(X_train, Y_train, epochs=1000, batch_size=128, validation_data=(X_val, Y_val), callbacks=[callback]) |

val_loss: 5.6081e-05
停在第163個Epoch

多對多模型 (輸入與輸出相同長度)
將return_sequences
設為True
,再用TimeDistributed(Dense(1))
將輸出調整為(5,1)
1 2 3 4 5 6 7 8 |
def buildManyToManyModel(shape): model = Sequential() model.add(LSTM(10, input_length=shape[1], input_dim=shape[2], return_sequences=True)) # output shape: (5, 1) model.add(TimeDistributed(Dense(1))) model.compile(loss="mse", optimizer="adam") model.summary() return model |
pastDay
以及futureDay
設為相同長度5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
train = readTrain() train_Aug = augFeatures(train) train_norm = normalize(train_Aug) # change the last day and next day X_train, Y_train = buildTrain(train_norm, 5, 5) X_train, Y_train = shuffle(X_train, Y_train) X_train, Y_train, X_val, Y_val = splitData(X_train, Y_train, 0.1) # from 2 dimmension to 3 dimension Y_train = Y_train[:,:,np.newaxis] Y_val = Y_val[:,:,np.newaxis] model = buildManyToManyModel(X_train.shape) callback = EarlyStopping(monitor="loss", patience=10, verbose=1, mode="auto") model.fit(X_train, Y_train, epochs=1000, batch_size=128, validation_data=(X_val, Y_val), callbacks=[callback]) |

val_loss: 9.3788e-05
停在第169個Epoch
