问题
I want to write a neural network which look for a x^2 distribution without a predefined model. Precisely, it is given some points in [-1,1] with their squares to train, and then it would have to reproduce and predict similar for e.g. [-10,10]. I've more or less done it - without datasets. But then I tried to modify it in order to use datasets and learn how to use it. Now, I succeded in making the program run, but the output is worse then before, mainly it's constant 0.
Previous version was like x^2 in [-1,1] with linear prolongation, which was better.. Previous output with a blue line being flat now. And the goal would be to coincide with a red one..
Here, comments are in Polish, sorry for that.
# square2.py - drugie podejscie do trenowania sieci za pomocą Tensorflow
# cel: nauczyć sieć rozpoznawać rozkład x**2
# analiza skryptu z:
# https://stackoverflow.com/questions/43140591/neural-network-to-predict-nth-square
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.python.framework.ops import reset_default_graph
# def. danych do trenowania sieci
# x_train = (np.random.rand(10**3)*4-2).reshape(-1,1)
# y_train = x_train**2
square2_dane = np.load("square2_dane.npz")
x_train = square2_dane['x_tren'].reshape(-1,1)
y_train = square2_dane['y_tren'].reshape(-1,1)
# zoptymalizować dzielenie danych
# x_train = square2_dane['x_tren'].reshape(-1,1)
# ds_x = tf.data.Dataset.from_tensor_slices(x_train)
# batch_x = ds_x.batch(rozm_paczki)
# iterator = ds_x.make_one_shot_iterator()
# określenie parametrów sieci
wymiary = [50,50,50,1]
epoki = 500
rozm_paczki = 200
reset_default_graph()
X = tf.placeholder(tf.float32, shape=[None,1])
Y = tf.placeholder(tf.float32, shape=[None,1])
weights = []
biases = []
n_inputs = 1
# inicjalizacja zmiennych
for i,n_outputs in enumerate(wymiary):
with tf.variable_scope("layer_{}".format(i)):
w = tf.get_variable(name="W", shape=[n_inputs,n_outputs],initializer = tf.random_normal_initializer(mean=0.0,stddev=0.02,seed=42))
b=tf.get_variable(name="b",shape=[n_outputs],initializer=tf.zeros_initializer)
weights.append(w)
biases.append(b)
n_inputs=n_outputs
def forward_pass(X,weights,biases):
h=X
for i in range(len(weights)):
h=tf.add(tf.matmul(h,weights[i]),biases[i])
h=tf.nn.relu(h)
return h
output_layer = forward_pass(X,weights,biases)
f_strat = tf.reduce_mean(tf.squared_difference(output_layer,Y),1)
f_strat = tf.reduce_sum(f_strat)
# alternatywna funkcja straty
#f_strat2 = tf.reduce_sum(tf.abs(Y-y_train)/y_train)
optimizer = tf.train.AdamOptimizer(learning_rate=0.003).minimize(f_strat)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# trenowanie
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
dataset = dataset.batch(rozm_paczki)
dataset = dataset.repeat(epoki)
iterator = dataset.make_one_shot_iterator()
ds_x, ds_y = iterator.get_next()
sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)})
saver = tf.train.Saver()
save = saver.save(sess, "./model.ckpt")
print("Model zapisano jako: %s" % save)
# puszczenie sieci na danych
x_test = np.linspace(-1,1,600)
network_outputs = sess.run(output_layer,feed_dict = {X :x_test.reshape(-1,1)})
plt.plot(x_test,x_test**2,color='r',label='y=x^2')
plt.plot(x_test,network_outputs,color='b',label='sieć NN')
plt.legend(loc='right')
plt.show()
I think that the problem is with input of training data
sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)})
or with the definition of ds_x, ds_y. It's my first such a program..
So this was the output for the lines (insead of the 'sees' block)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# trenowanie
for i in range(epoki):
idx = np.arange(len(x_train))
np.random.shuffle(idx)
for j in range(len(x_train)//rozm_paczki):
cur_idx = idx[rozm_paczki*j:(rozm_paczki+1)*j]
sess.run(optimizer,feed_dict = {X:x_train[cur_idx],Y:y_train[cur_idx]})
saver = tf.train.Saver()
save = saver.save(sess, "./model.ckpt")
print("Model zapisano jako: %s" % save)
Thanks!
P.S.: I was highly inspired by Neural Network to predict nth square
回答1:
There are two problems that conspire to give your model poor accuracy, and both involve this line:
sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)})
Only one training step will execute because this code is not in a loop. Your original code ran
len(x_train)//rozm_paczki
steps, which ought to make more progress.The two calls to
sess.run(ds_x)
andsess.run(ds_y)
run in separate steps, which means they will contain values from different batches that are unrelated. Each call tosess.run(ds_x)
orsess.run(ds_y)
moves theIterator
on to the next batch, and discards any parts of the input element that you did not explicitly request in thesess.run()
call. Essentially, you will getX
from batch i andY
from batch i+1 (or vice versa), and the model will train on invalid data. If you want to get values from the same batch, you need to do it in a singlesess.run([ds_x, ds_y])
call.
There are two further concerns that might impact efficiency:
The
Dataset
is not shuffled. Your original code callsnp.random.shuffle()
at the beginning of each epoch. You should include adataset = dataset.shuffle(len(x_train))
beforedataset = dataset.repeat()
.It is inefficient to fetch the values from the the
Iterator
back to Python (e.g. when you dosess.run(ds_x)
) and feed them back into the training step. It is more efficient to pass the output of theIterator.get_next()
operation directly into the feed-forward step as inputs.
Putting this all together, here's a rewritten version of your program that addresses these four points, and achieves the correct results. (Unfortunately my Polish isn't good enough to preserve the comments, so I've translated to English.)
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
# Generate training data.
x_train = np.random.rand(10**3, 1).astype(np.float32) * 4 - 2
y_train = x_train ** 2
# Define hyperparameters.
DIMENSIONS = [50,50,50,1]
NUM_EPOCHS = 500
BATCH_SIZE = 200
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
dataset = dataset.shuffle(len(x_train)) # (Point 3.) Shuffle each epoch.
dataset = dataset.repeat(NUM_EPOCHS)
dataset = dataset.batch(BATCH_SIZE)
iterator = dataset.make_one_shot_iterator()
# (Point 2.) Ensure that `X` and `Y` correspond to the same batch of data.
# (Point 4.) Pass the tensors returned from `iterator.get_next()`
# directly as the input of the network.
X, Y = iterator.get_next()
# Initialize variables.
weights = []
biases = []
n_inputs = 1
for i, n_outputs in enumerate(DIMENSIONS):
with tf.variable_scope("layer_{}".format(i)):
w = tf.get_variable(name="W", shape=[n_inputs, n_outputs],
initializer=tf.random_normal_initializer(
mean=0.0, stddev=0.02, seed=42))
b = tf.get_variable(name="b", shape=[n_outputs],
initializer=tf.zeros_initializer)
weights.append(w)
biases.append(b)
n_inputs = n_outputs
def forward_pass(X,weights,biases):
h = X
for i in range(len(weights)):
h=tf.add(tf.matmul(h, weights[i]), biases[i])
h=tf.nn.relu(h)
return h
output_layer = forward_pass(X, weights, biases)
loss = tf.reduce_sum(tf.reduce_mean(
tf.squared_difference(output_layer, Y), 1))
optimizer = tf.train.AdamOptimizer(learning_rate=0.003).minimize(loss)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# (Point 1.) Run the `optimizer` in a loop. Use try-while-except to iterate
# until all elements in `dataset` have been consumed.
try:
while True:
sess.run(optimizer)
except tf.errors.OutOfRangeError:
pass
save = saver.save(sess, "./model.ckpt")
print("Model saved to path: %s" % save)
# Evaluate network.
x_test = np.linspace(-1, 1, 600)
network_outputs = sess.run(output_layer, feed_dict={X: x_test.reshape(-1, 1)})
plt.plot(x_test,x_test**2,color='r',label='y=x^2')
plt.plot(x_test,network_outputs,color='b',label='NN prediction')
plt.legend(loc='right')
plt.show()
来源:https://stackoverflow.com/questions/47638104/raising-to-a-square-with-tensorflow-with-a-dataset-class