If cache coherency is implemented at hardware level why do we need volatile? Any core/processor should get the latest value anyway?
Or is it dealing with a different iss
Cache coherence may be implemented at the processor level but, unless the processor memory model guarantees sequential consistency (which is not the case on most modern architectures), you will only get cache coherence if you ask for it.
That is what volatile is for: it asks the JVM to produce the relevant machine instruction(s) that will ask the processor(s) to synchronize its cache with main memory.