问题
Like this but without the CALL
instruction. I suppose that I should use JMP
and probably other instructions.
PUSH 5
PUSH 4
CALL Function
回答1:
This is fairly easy to do. Push the return address onto the stack and then jump to the subroutine. The final code looks like this:
PUSH 5
PUSH 4
PUSH offset label1
jmp Function
label1: ; returns here
leas esp, 8[esp]
Function:
...
ret
While this works, you really don't want to do this. On most modern processors, an on-chip call stack return address cache is kept, which pushes return addresses on a call, and pops return addresses on an RET. Being on the processor this has extremely short update/access times, which means the RET instruction can use the call-stack cache popped value to predict where the PC should go next, rather than waiting for the actual memory read from the memory location actually pointed to by ESP. If you do the "PUSH offset label1" trick, this cache does not get updated, and thus the RET branch prediction is wrong and the processor pipeline gets blown, having a severe negative impact on performance. (I think IBM has a patent on special instructions which are essentially "PUSHRETURNADDRESS k" and "POPRETURNADDESS", allowing this trick to be used on some of their CPUs. Alas, not on the x86.
回答2:
It depends on the situation. If the last thing your function does before returning is call another function, you can simply jump to that function. This is called tail call elimination, and is an optimization performed by many compilers. Example:
foo:
call B
call A
ret
Tail call elimination replaces the last two lines with a single jump instruction:
foo:
call B
jmp A
This works because the stack contains the return address of foo
's caller. So when function A
returns, it returns back to the function that called foo
.
It you want execution to resume after the jump to A, push that address onto the stack before jumping:
foo:
call B
push offset bar
jmp A
bar:
However, I can think of no reason why anybody would want to do this.
回答3:
Before x86-64, call
was the only instruction that could read EIP. (I guess int
as well, but it doesn't put the result anywhere you can read from user-space).
So it's impossible to simulate call
in position-independent code. In fact, 32-bit PIC code uses call
to find out its own address.
But in x86-64, we have RIP-relative lea
... put function args in registers
lea rax, [rel ret_addr] ; AT&T lea ret_addr(%rip), %rax
push rax
jmp call_target
ret_addr:
call
itself internally decodes as push RIP
/ jmp target
, where RIP during execution of an instruction = address of the end of that instruction = start of the next.
Of course this is normally terrible for performance, unbalancing the return-address predictor stack. http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/. Use a normal call
unless you want a ret
to mispredict, e.g. for a retpoline or specpoline.
(A tailcall with just jmp
is fine, collapsing a call/ret pair into a jmp, but pushing a new return address manually is always a problem.)
来源:https://stackoverflow.com/questions/21248227/how-can-i-simulate-a-call-instruction-by-using-jmp