Recurrent Neural Network Binary Classification

问题

I have access to a dataframe of 100 persons and how they performed on a certain motion test. This frame contains about 25,000 rows per person since the performance of this person is kept track of (approximately) each centisecond (10^-2). We want to use this data to predict a binary y-label, that is to say, if someone has a motor problem or not.

The columns and some values of the dataset are follows:

'Person_ID', 'time_in_game', 'python_time', 'permutation_game, 'round', 'level', 'times_level_played_before', 'speed', 'costheta', 'y_label', 'gender', 'age_precise', 'ax_f', 'ay_f', 'az_f', 'acc', 'jerk'
1,            0.25,           1.497942e+09,  2,                 1,      'level_B', 1,                           0.8,    0.4655,    1,         [...]

I reduced the dataset to only 480 rows per person, by just using the row at each half of a second.

Now I want to use a recurrent neural network to predict the binary y_label.

This code extracts the costheta feature used for the input data X and the y-label for output Y.

X = []
Y = []

for ID in person_list:
    person_frame = df.loc[df['Person_ID'] == Person_ID]

    # costheta is a measurement of performance
    coslist = list(person_frame['costheta'])

    # extract y-label
    score = list(person_frame['y_label'].head(1))[0]

    X.append(coslist)
    Y.append(binary)

I splitted the data in to training and testing data using a 0.2 test split. Then I tried to create the RNN with Keras as follows:

from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout

embedding_size=32
model=Sequential()

# different_input_values are the set of possible input values
model.add(Embedding(different_input_values, embedding_size, input_length=480))
model.add(LSTM(1000))

# output is binary
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

At last, I began training with this code:

model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

batch_size = 64
num_epochs = 100

X_valid, y_valid = X_train[:batch_size], Y_train[:batch_size]
X_train2, y_train2 = X_train[batch_size:], Y_train[batch_size:]

model.fit(X_train2, y_train2, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=num_epochs).

However, the gained accuracy is really low. Depending on the batch size it varies between 0.4 and 0.6.

12/12 [==============================] - 13s 1s/step - loss: 0.6921 - acc: 0.7500 - val_loss: 0.7069 - val_acc: 0.4219

My question is, in general, with complicated data like this, how does one efficiently train a RNN. Should one refrain from reducing the data to 480 rows per person and keep it around 25,000 rows per? Could multiple metrics, such as acc (acceleration in game) and jerk cause a significant accuracy gain? What are significant improvements that one could change and consider?

来源：https://stackoverflow.com/questions/54290619/recurrent-neural-network-binary-classification

标签

python

tensorflow

machine-learning

keras

recurrent-neural-network