Getting reproducible results using tensorflow-gpu

前端 未结 1 1321
我寻月下人不归
我寻月下人不归 2021-02-14 06:32

Working on a project using Tensorflow. However, I can\'t seem to reproduce my results.

I have tried setting the graph level seed, numpy random seed and even operation le

1条回答
  •  鱼传尺愫
    2021-02-14 07:13

    Cool, that you want to make your results reproducible! However, there are many things to note here:

    I call a paper reproducible if one can obtain exactly the same numbers as found in the paper by executing exactly the same steps. This means if one had access to the same environment, the same software, hardware and data, one would be able to get the same results. In contrast, a paper is called replicatable if one can achieve the same results if one only follows the textual description in the paper. Hence replicability is harder to achieve, but also a more powerful indicator of the quality of the paper

    You want to achieve that the training results on a bit-wise identical model. The holy grail would be to write your paper in a way that if people ONLY have the paper, they can still confirm your results.

    Please also note that in many important papers results are practically impossible to reproduce:

    • Datasets are often not available: JFT-300M
    • Massive usage of computational power: For one of the AutoML/Architecture Search papers by Google I asked the author how many GPU-hours they spent on one of the experiments. At the time, if I wanted that many GPU-hours it would have costed me around 250,000 USD.

    If that is a problem, depends very much on the context. As a comparison, think of CERN / LHC: It is impossible to have completely identical experiments. Only very few institutions on earth have the instruments to check the results. Still it is not a problem. So ask your advisor / people who have already published in that journal / conference.

    Achieving Replicatability

    This is super hard. I think the following is helpful:

    • Make sure the quality metrics you mention don't have too many digits
    • As the training likely depends on random initialization, you might also want to give rather an interval than a single number
    • Try minor variations
    • Re-implement things from scratch (maybe with another library?)
    • Ask colleagues to read your paper and then explain back to you what they think you did.

    Getting Bit-Wise identical Model

    It seems to me that you already do the important things:

    • Setting all seeds: numpy, tensorflow, random, ...
    • Making sure the Training-Test split is consistent
    • Making sure the training data is loaded in the same order

    Please note that there might be factors out of your control:

    • Bitflips: B. Schroeder, E. Pinheiro, and W.-D. Weber, “Dram errors in the wild: a large-scale field study”
    • Inherent Hardware/Software reproducibility problems: Floating point multiplication is not associative and different cores on a GPU might finish computations at different times. Thus each single run could lead to different results. (I'd be happy if somebody could give an authorative reference here)

    0 讨论(0)
提交回复
热议问题