I'm currently facing an issue while trying to fit my GRU model with my training data. After a quick look on StackOverflow, I found this post to be quite similar to my issue :
Simplest Lstm training with Keras io
My own model is as follow :
nn = Sequential()
nn.add(Embedding(input_size, hidden_size))
nn.add(GRU(hidden_size_2, return_sequences=False))
nn.add(Dropout(0.2))
nn.add(Dense(output_size))
nn.add(Activation('linear'))
nn.compile(loss='mse', optimizer="rmsprop")
history = History()
nn.fit(X_train, y_train, batch_size=30, nb_epoch=200, validation_split=0.1, callbacks=[history])
And the error is :
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-14-e2f199af6e0c> in <module>()
1 history = History()
----> 2 nn.fit(X_train, y_train, batch_size=30, nb_epoch=200, validation_split=0.1, callbacks=[history])
C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in fit(self, X, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, show_accuracy, class_weight, sample_weight)
487 verbose=verbose, callbacks=callbacks,
488 val_f=val_f, val_ins=val_ins,
--> 489 shuffle=shuffle, metrics=metrics)
490
491 def predict(self, X, batch_size=128, verbose=0):
C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in _fit(self, f, ins, out_labels, batch_size, nb_epoch, verbose, callbacks, val_f, val_ins, shuffle, metrics)
199 batch_ids = index_array[batch_start:batch_end]
200 try:
--> 201 ins_batch = slice_X(ins, batch_ids)
202 except TypeError as err:
203 raise Exception('TypeError while preparing batch. \
C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in slice_X(X, start, stop)
53 if type(X) == list:
54 if hasattr(start, '__len__'):
---> 55 return [x[start] for x in X]
56 else:
57 return [x[start:stop] for x in X]
C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1789 if isinstance(key, (Series, np.ndarray, Index, list)):
1790 # either boolean or fancy integer index
-> 1791 return self._getitem_array(key)
1792 elif isinstance(key, DataFrame):
1793 return self._getitem_frame(key)
C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
1833 return self.take(indexer, axis=0, convert=False)
1834 else:
-> 1835 indexer = self.ix._convert_to_indexer(key, axis=1)
1836 return self.take(indexer, axis=1, convert=True)
1837
C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
1110 mask = check == -1
1111 if mask.any():
-> 1112 raise KeyError('%s not in index' % objarr[mask])
1113
1114 return _values_from_object(indexer)
KeyError: '[ 61 13980 11357 5577 11500 12125 19673 10985 2480 5237 2519 14874\n 16003 2611 3851 10837 11865 14607 10682 5495 10220 5043 23145 11280\n 9547 4766 18323 730 6263] not in index'
Any idea to solve this ? Thanks
EDIT : Some facts about the data :
data_X = pd.read_csv("X.csv")
data_Y = pd.read_csv("Y.csv")
def train_test_split(X,Y, test_size=0.15):
# This just splits data to training and testing parts
ntrn = int(round(X.shape[0] * (1 - test_size)))
perms = np.random.permutation(X.shape[0])
X_train = X.ix[perms[0:ntrn]]
Y_train = Y.ix[perms[0:ntrn]]
X_test = X.ix[perms[ntrn:]]
Y_test = Y.ix[perms[ntrn:]]
return (X_train, Y_train), (X_test, Y_test)
X and Y are CSV file containing time series values (e.g. for each row, there are 37 consecutive values of the time series in the X file + 2 time values (considered as past) and 30 in the Y file (considered as the forecast to predict))
print X_train[:1]
print y_train[:1]
0 1 2 3 4 5 6 7 8 9 ... 29 30 31 32 \
1629 84 76 76 72 72 72 72 87 87 100 ... 165 165 169 169
33 34 35 36 37 38
1629 166 166 185 185 1236778440 1236789240
[1 rows x 39 columns]
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 \
1629 195 195 195 195 196 196 194 194 192 192 ... 182 182 164
23 24 25 26 27 28 29
1629 164 146 146 128 128 103 103
[1 rows x 30 columns]
I couldn't use Pandas DataFrames as inputs & outputs to Keras model.fit, at least not Pandas 0.13.1, which is the standard package from Ubuntu.
Instead, use np.array(X_train) and np.array(Y_train). That worked for me.
I've experienced a similar issue. In my case the problem was in that you use Embeddings layer with predefined dimensions on input, so the sequences you pass to this layer should be padded or truncated to the input_size using keras.preprocessing.sequence.
来源:https://stackoverflow.com/questions/33564181/keras-gru-nn-keyerror-when-fitting-not-in-index