One option is to use Aparapi and install the Intel OpenCL driver. (Your code will be vectorized to work on both CPUs and GPUs, as long as an appropriate OpenCL driver is installed.)
Another option is to use JNI and call a C++ function that uses AVX intrinsics or was auto-vectorized by a compiler.