How to improve performance without going parallel for my backprop ANN

前端 未结 4 1463
陌清茗
陌清茗 2021-02-19 03:49

After profiling my Back propagation algorithm, I have learnt it is responsible for taking up 60% of my computation time. Before I start looking at parallel alternatives

4条回答
  •  [愿得一人]
    2021-02-19 04:19

    You can't avoid an O(n^2) algorithm if you want to train/use a NN. But it is perfectly suited for vector arithmetic. For example with clever use of SSE or AVX you could process the neurons in chunks of 4 or 8 and use a multiply-add instead of two separate instructions.

    If you use a modern compiler and carefully reformulate the algorithm and use the right switches, you might even get the compiler to autovectorize the loops for you, but your mileage may vary.

    For gcc, autovectorization is activated using -O3 or -ftree-vectorize. You need an vector capable cpu of course, something like -march=core2 -mssse4.1 or similar, depending on the target cpu. If you use -ftree-vectorizer-verbose=2 you get detailed explanations, why and where loops were not vectorized. Have a look at http://gcc.gnu.org/projects/tree-ssa/vectorization.html .

    Better is of course using the compiler intrinsics directly.

提交回复
热议问题