问题
I propose a example in which a tf.keras
model fails to learn from very simple data. I'm using tensorflow-gpu==2.0.0
, keras==2.3.0
and Python 3.7. At the end of my post, I give the Python code to reproduce the problem I observed.
- Data
The samples are Numpy arrays of shape (6, 16, 16, 16, 3). To make things very simple, I only consider arrays full of 1s and 0s. Arrays with 1s are given the label 1 and arrays with 0s are given the label 0. I can generate some samples (in the following, n_samples = 240
) with this code:
def generate_fake_data():
for j in range(1, 240 + 1):
if j < 120:
yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
else:
yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])
In order to input this data in a tf.keras
model, I create an instance of tf.data.Dataset
using the code below. This will essentially create shuffled batches of BATCH_SIZE = 12
samples.
def make_tfdataset(for_training=True):
dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
output_types=(tf.float32,
tf.float32),
output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
tf.TensorShape([2])))
dataset = dataset.repeat()
if for_training:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
- Model
I propose the following model to classify my samples:
def create_model(in_shape=(6, 16, 16, 16, 3)):
input_layer = Input(shape=in_shape)
reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)
conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)
relu_layer_1 = ReLU()(conv3d_layer)
pooling_layer = GlobalAveragePooling3D()(relu_layer_1)
reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)
expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)
conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)
relu_layer_2 = ReLU()(conv1d_layer)
reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)
out = Dense(units=2, activation='softmax')(reshape_layer_2)
return Model(inputs=[input_layer], outputs=[out])
The model is optimized using Adam (with default parameters) and with the binary_crossentropy
loss:
clf_model = create_model()
clf_model.compile(optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy', 'categorical_crossentropy'])
The output of clf_model.summary()
is:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 6, 16, 16, 16, 3) 0
_________________________________________________________________
lambda (Lambda) (None, 16, 16, 16, 3) 0
_________________________________________________________________
conv3d (Conv3D) (None, 8, 8, 8, 64) 98368
_________________________________________________________________
re_lu (ReLU) (None, 8, 8, 8, 64) 0
_________________________________________________________________
global_average_pooling3d (Gl (None, 64) 0
_________________________________________________________________
lambda_1 (Lambda) (None, 384) 0
_________________________________________________________________
lambda_2 (Lambda) (None, 1, 384) 0
_________________________________________________________________
conv1d (Conv1D) (None, 1, 1) 385
_________________________________________________________________
re_lu_1 (ReLU) (None, 1, 1) 0
_________________________________________________________________
lambda_3 (Lambda) (None, 1) 0
_________________________________________________________________
dense (Dense) (None, 2) 4
=================================================================
Total params: 98,757
Trainable params: 98,757
Non-trainable params: 0
- Training
The model is trained for 500 epochs as follows:
train_ds = make_tfdataset(for_training=True)
history = clf_model.fit(train_ds,
epochs=500,
steps_per_epoch=ceil(240 / BATCH_SIZE),
verbose=1)
- The problem!
During the 500 epochs, the model loss stays around 0.69 and never goes below 0.69. This is also true if I set the learning rate to
1e-2
instead of1e-3
. The data is very simple (just 0s and 1s). Naively, I would expect the model to have a better accuracy than just 0.6. In fact, I would expect it to reach 100% accuracy quickly. What I am doing wrong?
- The full code...
import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as K
from math import ceil
from tensorflow.keras.layers import Input, Dense, Lambda, Conv1D, GlobalAveragePooling3D, Conv3D, ReLU
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
BATCH_SIZE = 12
def generate_fake_data():
for j in range(1, 240 + 1):
if j < 120:
yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
else:
yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])
def make_tfdataset(for_training=True):
dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
output_types=(tf.float32,
tf.float32),
output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
tf.TensorShape([2])))
dataset = dataset.repeat()
if for_training:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
def create_model(in_shape=(6, 16, 16, 16, 3)):
input_layer = Input(shape=in_shape)
reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)
conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)
relu_layer_1 = ReLU()(conv3d_layer)
pooling_layer = GlobalAveragePooling3D()(relu_layer_1)
reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)
expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)
conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)
relu_layer_2 = ReLU()(conv1d_layer)
reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)
out = Dense(units=2, activation='softmax')(reshape_layer_2)
return Model(inputs=[input_layer], outputs=[out])
train_ds = make_tfdataset(for_training=True)
clf_model = create_model(in_shape=(6, 16, 16, 16, 3))
clf_model.summary()
clf_model.compile(optimizer=Adam(lr=1e-3),
loss='categorical_crossentropy',
metrics=['accuracy', 'categorical_crossentropy'])
history = clf_model.fit(train_ds,
epochs=500,
steps_per_epoch=ceil(240 / BATCH_SIZE),
verbose=1)
回答1:
Your code has a single critical problem: dimensionality shuffling. The one dimension you should never touch is the batch dimension - as it, by definition, holds independent samples of your data. In your first reshape, you mix features dimensions with the batch dimension:
Tensor("input_1:0", shape=(12, 6, 16, 16, 16, 3), dtype=float32)
Tensor("lambda/Reshape:0", shape=(72, 16, 16, 16, 3), dtype=float32)
This is like feeding 72 independent samples of shape (16,16,16,3)
. Further layers suffer similar problems.
SOLUTION:
- Instead of reshaping every step of the way (for which you should use
Reshape
), shape your existing Conv and pooling layers to make everything work out directly. - Aside the input and output layers, it's better to title each layer something short and simple - no clarity is lost, as each line is well-defined by layer name
GlobalAveragePooling
is intended to be the final layer, as it collapses features dimensions - in your case, like so:(12,16,16,16,3) --> (12,3)
; Conv afterwards serves little purpose- Per above, I replaced
Conv1D
withConv3D
- Unless you're using variable batch sizes, always go for
batch_shape=
vs.shape=
, as you can inspect layer dimensions in full (very helpful) - Your true
batch_size
here is 6, deducing from your comment reply kernel_size=1
and (especially)filters=1
is a very weak convolution, I replaced it accordingly - you can revert if you wish- If you have only 2 classes in your intended application, I advise using
Dense(1, 'sigmoid')
withbinary_crossentropy
loss
As a last note: you can toss all of the above out except for the dimensionality shuffling advice, and still get perfect train set performance; it was the root of the problem.
def create_model(batch_size, input_shape):
ipt = Input(batch_shape=(batch_size, *input_shape))
x = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2),
activation='relu', padding='same')(ipt)
x = Conv3D(filters=8, kernel_size=4, strides=(2, 2, 2),
activation='relu', padding='same')(x)
x = GlobalAveragePooling3D()(x)
out = Dense(units=2, activation='softmax')(x)
return Model(inputs=ipt, outputs=out)
BATCH_SIZE = 6
INPUT_SHAPE = (16, 16, 16, 3)
BATCH_SHAPE = (BATCH_SIZE, *INPUT_SHAPE)
def generate_fake_data():
for j in range(1, 240 + 1):
if j < 120:
yield np.ones(INPUT_SHAPE), np.array([0., 1.])
else:
yield np.zeros(INPUT_SHAPE), np.array([1., 0.])
def make_tfdataset(for_training=True):
dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
output_types=(tf.float32,
tf.float32),
output_shapes=(tf.TensorShape(INPUT_SHAPE),
tf.TensorShape([2])))
dataset = dataset.repeat()
if for_training:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
RESULTS:
Epoch 28/500
40/40 [==============================] - 0s 3ms/step - loss: 0.0808 - acc: 1.0000
回答2:
Since your labels can be either 0 or 1, I would recommend changing the activation function to softmax
and the number of output neurons to 2. Now, the last layer (output) will look like this:
out = Dense(units=2, activation='softmax')(reshaped_conv_features)
I have faced the same problem before and figured out that since the probabilities of being a 1 or a 0 are related, in the sense that it is not a multilabel classification problem, Softmax is a better option. Sigmoid assigns probabilities irrespective of other possible output labels.
来源:https://stackoverflow.com/questions/58237726/keras-model-fails-to-decrease-loss