I have two files which are assembled/compiled/linked into minimalistic kernel.
start.s:
.set CPACR_EL1_FPEN, 0b11 << 20
.set BOOT_STAC
I've succeeded to run the non-optimised code without abnormalities. Thanks to Notlikethat for the idea. My stack was just mapped into readonly memory.
So I've just added the offset statement into my linker script (". = 1024M;") in order to make all the symbols to start from 1GiB (where RAM begins). After this modification the code started to work properly.