I've found one case when using FENCE instruction is just necessary.
Example:
Some module in a SoC generates interrupt by writting value into CSR 0x783 (MIPI) via HostIO bus.
Firmware jumps to the interrupt handler.
Handler tries to reset 'pending' bit in a user implemented device by writting 1 into register.
Such operation was compiled as a 'store' instruction with immediate value =1.
As result, if I don't implement FENCE at the beginning of the handler I have some garbage value instead of proper immediate argument of the instruction.