Why is this TensorFlow implementation vastly less successful than Matlab's NN?

后端未结

关注

 2  950

As a toy example I\'m trying to fit a function f(x) = 1/x from 100 no-noise data points. The matlab default implementation is phenomenally successful with mean squa

相关标签:

2条回答

猫巷女王i

2021-01-30 11:25

I tried training for 50000 iterations it got to 0.00012 error. It takes about 180 seconds on Tesla K40.

It seems that for this kind of problem, first order gradient descent is not a good fit (pun intended), and you need Levenberg–Marquardt or l-BFGS. I don't think anyone implemented them in TensorFlow yet.

Edit Use tf.train.AdamOptimizer(0.1) for this problem. It gets to 3.13729e-05 after 4000 iterations. Also, GPU with default strategy also seems like a bad idea for this problem. There are many small operations and the overhead causes GPU version to run 3x slower than CPU on my machine.

0 讨论(0)
发布评论:

提交评论
- 加载中...

囚心锁ツ

2021-01-30 11:40

btw, here's a slightly cleaned up version of the above that cleans up some of the shape issues and unnecessary bouncing between tf and np. It achieves 3e-08 after 40k steps, or about 1.5e-5 after 4000:

import tensorflow as tf
import numpy as np

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

xTrain = np.linspace(0.2, 0.8, 101).reshape([1, -1])
yTrain = (1/xTrain)

x = tf.placeholder(tf.float32, [1,None])
hiddenDim = 10

b = bias_variable([hiddenDim,1])
W = weight_variable([hiddenDim, 1])

b2 = bias_variable([1])
W2 = weight_variable([1, hiddenDim])

hidden = tf.nn.sigmoid(tf.matmul(W, x) + b)
y = tf.matmul(W2, hidden) + b2

# Minimize the squared errors.                                                                
loss = tf.reduce_mean(tf.square(y - yTrain))
step = tf.Variable(0, trainable=False)
rate = tf.train.exponential_decay(0.15, step, 1, 0.9999)
optimizer = tf.train.AdamOptimizer(rate)
train = optimizer.minimize(loss, global_step=step)
init = tf.initialize_all_variables()

# Launch the graph                                                                            
sess = tf.Session()
sess.run(init)

for step in xrange(0, 40001):
    train.run({x: xTrain}, sess)
    if step % 500 == 0:
        print loss.eval({x: xTrain}, sess)

All that said, it's probably not too surprising that LMA is doing better than a more general DNN-style optimizer for fitting a 2D curve. Adam and the rest are targeting very high dimensionality problems, and LMA starts to get glacially slow for very large networks (see 12-15).

0 讨论(0)