How do I use the Intel AVX vector instruction set from Java? It\'s a simple question but the answer seems to be hard to find.
One option is to use Aparapi and install the Intel OpenCL driver. (Your code will be vectorized to work on both CPUs and GPUs, as long as an appropriate OpenCL driver is installed.)
Another option is to use JNI and call a C++ function that uses AVX intrinsics or was auto-vectorized by a compiler.
Check for yeppp library. It has java binding and its very fast cross platform SIMD Library.
http://www.yeppp.info
Depending on the work, you may not have to do much. AVX2 is used automatically by JVM on some operations on Arrays and String, on supporting platforms like Haswell onwards and Xeon v3 onwards.
https://software.intel.com/en-us/articles/java-application-performance-improvement-with-intel-xeon-processor-e7-v3
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2
From Ryzen 3 / Epyc Rome, you'll also have single cycle AVX2 on AMD processors: https://www.anandtech.com/print/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome
Direct use of the instructions and intrinsics are not easily available though.
As I know, most current Java JVM JITters don't support automatic vectorization or just do that for very simple loops, so you're out of luck.
In Mono's .NET implementation there's Mono.Simd
for manual vector code emission and then later MS introduced the System.Numeric.Vectors
. Unfortunately there's nothing similar in Java. I don't know if Java's vector class is vectorized using SIMD or not but I don't think it is.
If you want to use CPU-specific features like AVX then your only choice is JNI. Write your bottle neck part in C or C++ and call it from Java
There's another solution by Scala to use vectorized code without modifying the JVM that you can read in How we made the JVM 40x faster
Now there's a new Vector API being developed for writing vector code manually
Provide an initial iteration of an incubator module, jdk.incubator.vector, to express vector computations that reliably compile at runtime to optimal vector hardware instructions on supported CPU architectures and thus achieve superior performance to equivalent scalar computations.
https://openjdk.java.net/jeps/338
Read more:
To use these operations from Java you need a library like JavaCV This library enables vector operations for both Intel Processors and GPUs like NVIDIA.