May be it will useful for you:
There is a Simd Library, which has an implementation of HAAR and LBP cascade classifiers. It can use standard HAAR and LBP casscades from OpenCV. This implementation has SIMD optimizations with using of SSE4.1, AVX2, AVX-512 and NEON(ARM), so it works in 2-3 times faster then original OpenCV implementation.