I am performing a neural network training in parallel version in C++. When I compare the time required for both serial and parallel version of the code everything seems fine on