问题
I realized that my models end up being different every time I train them, even though I keep the TensorFlow random seed the same.
I verified that:
- Initialization is deterministic; the weights are identical before the first update.
- Inputs are deterministic. In fact, various forward computations, including the loss, are identical for the very first batch.
- The gradients for the first batch are different. Concretely, I'm comparing the outputs of
tf.gradients(loss, train_variables)
. Whileloss
andtrain_variables
have identical values, the gradients are sometimes different for some of the Variables. The differences are quite significant (sometimes the sum-of-absolute-differences for a single variable's gradient is greater than 1).
I conclude that it's the gradient computation that causes the non-determinism.
I had a look at this question and the problem persists when running on a CPU with intra_op_parallelism_thread=1
and inter_op_parallelism_thread=1
.
How can the backward pass be non-deterministic when the forward pass isn't? How could I debug this further?
回答1:
This answer might seem a little obvious, but do you use some kind of non-deterministic regularization such as dropout? Given that dropout "drops" some connections randomly when training, it may be causing that difference on the gradients.
Edit: Similar questions:
- How to get stable results with TensorFlow, setting random seed
- Tensorflow not being deterministic, where it should
Edit 2: This seems to be an issue with TensorFlow's implementation. See the following open issues in GitHub:
- Problems Getting TensorFlow to behave Deterministically
- Non-deterministic behaviour when ran on GPU
来源:https://stackoverflow.com/questions/42412660/non-deterministic-gradient-computation