How to use multi CPU cores to train NNs using caffe and OpenBLAS

前端 未结 3 1902
我在风中等你
我在风中等你 2021-02-10 20:57

I am learning deep learning recently and my friend recommended me caffe. After install it with OpenBLAS, I followed the tutorial, MNIST task in the doc. But later I found it was

相关标签:
3条回答
  • 2021-02-10 21:18

    While building OpenBLAS, you have to set the flag USE_OPENMP = 1 to enable OpenMP support. Next set Caffe to use OpenBLAS in the Makefile.config, please export the number of threads you want to use during runtime by setting up OMP_NUM_THREADS=n where n is the number of threads you want.

    0 讨论(0)
  • 2021-02-10 21:32

    @Karthik. That also works for me. One interesting discovery that I made was that using 4 threads reduces forward/backward pass during the caffe timing test by a factor of 2. However, increasing the thread count to 8 or even 24 results in f/b speed that is less than what I get with OPENBLAS_NUM_THREADS=4. Here are times for a few thread counts (tested on NetworkInNetwork model).

    [#threads] [f/b time in ms]
    1 223
    2 150
    4 113
    8 125
    12 144

    For comparison, on a Titan X GPU the f/b pass took 1.87 ms.

    0 讨论(0)
  • 2021-02-10 21:36

    I found that this method works:

    When you build the caffe, in your make command, do use this for 8 cores: make all -j8 and make pycaffe -j8

    Also, make sure OPENBLAS_NUM_THREADS=8 is set.

    This question has a full script for the same.

    0 讨论(0)
提交回复
热议问题