I have done some reading about Spectre v2 and obviously you get the non technical explanations. Peter Cordes has a more in-depth explanation but it doesn\'t fully address a
Thanks Brendan and Hadi Brais, after reading your answers and finally reading the spectre paper it is clear now where I was going wrong in my thinking and I confused the two a little.
I was partially describing Spectre v1 which causes a bounds check bypass by mistraining the branch history of a jump i.e. if (x < array1_size)
to a spectre gadget. This is obviously not an indirect branch. The hacker does this by invoking a function containing the spectre gadget with legal parameters to prime the branch predictor (PHT+BHT) and then invoke with illegal parameters to bring array1[x]
into cache. They then reprime the branch history by supplying legal parameters and then flush array1_size
from cache (which I'm not sure how they do because even if the attacker process knows the VA of array1_size
, the line cannot be flushed because the TLB contains a different PCID for the process, so it must be caused to be evicted in some way i.e. filling the set at that virtual address). They then invoke with the same illegal parameters as before and as array1[x]
is in cache but array1_size
is not, array[x]
will resolve quickly and begin the load of array2[array1[x]]
while still waiting on array1_size
, which loads a position in array2
based on the secret at any x that transcends the bounds of array1
. The attacker then recalls the function with a valid value of x and times the function call (I assume the attacker must know the contents of array1
because if array2[array1[8]]
results in a faster access they need to know what is at array1[8]
as that is the secret, but surely that array would have to contain every 2^8 bit combination right).
Spectre v2 on the other hand requires a second attack process that knows the virtual address of an indirect branch in the victim process so that it can poison the target and replace it with another address. If the attack process contains a jump instruction that would reside in the same set, way and tag in the IBTB as the victim indirect branch would then it just trains that branch instruction to predict taken and jump to a virtual address which happens to be that of the gadget in the victim process. When the victim process encounters the indirect branch the wrong target address from the attack program is in the IBTB. It is crucial that it is an indirect branch because falsities as a result of a process switch are usually checked at decode i.e. if the branch target differs from the target in the BTB for that RIP then it flushes the instructions fetched before it. This cannot be done with indirect branches because it does not know the target until the execution stage and hence the idea is that the indirect branch selected depends on a value that needs to be fetched from cache. It then jumps to this target address which is that of the gadget and so on and so forth.
The attacker needs to know the source code of the victim process to identify a gadget and they need to know the VA at which it will reside. I assume this could be done by knowing predictably where code will be loaded. I believe .exes are typically loaded at x00400000 for instance and then there is a BaseOfCode in the PE header.
Edit: I just read Appendix B of the spectre paper and it makes for a nice Windows implementation of Spectre v2.
As a proof-of-concept, we constructed a simple target application which provides the service of computing a SHA1 hash of a key and an input message. This implementation consisted of a program which continuously runs a loop which calls Sleep(0), loads the input from a file, invokes the Windows cryptography functions to compute the hash, and prints the hash whenever the input changes. We found that the
Sleep()
call is done with data from the input file in registers ebx, edi, and an attacker-known value for edx, i.e., the content of two registers is controlled by the attacker. This is the input criteria for the type of Spectre gadget described in the beginning of this section.
It uses ntdll.dll
(.dll full of native API system call stubs) and kernel32.dll
(Windows API) which are always mapped in the user virtual address space on the direction of ASLR (specified in the .dll images), except the physical address is likely to be the same due to copy-on-write view mapping into the page cache. The indirect branch to poison will be in the Windows API Sleep()
function in kernel32.dll
which appears to indirectly call NtDelayExecution()
in ntdll.dll
. The attacker then ascertains the address of the indirect branch instruction and maps a page encompassing the victim address that contains the target address into its own address space and changes the target address stored at that address to that of the gadget that they identified to be residing somewhere in the same or another function in ntdll.dll
(I'm not entirely sure (due to ASLR) how the attacker knows for certain where the victim process maps kernel32.dll
and ntdll.dll
in its address space in order to locate the address of the indirect branch in Sleep()
for the victim. Appendix B claims they used 'Simple pointer operations' to locate the indirect branch and address that contains the target -- how that works I'm not sure). Threads are then launched with the same affinity of the victim (so that the victim and mistraining threads hyperthread on the same physical core) which call Sleep()
themselves to train it indirectly which in the address space context of the hack process will now jump to the address of the gadget. The gadget is temporarily replaced with a ret
so that it returns from Sleep()
smoothly. These threads will also execute a sequence before the indirect jump to mimic what the global branch history of the victim would be before encountering the indirect jump to fully ensure that the branch is taken in an alloyed history. A separate thread is then launched with the complement of the thread affinity of the victim that repeatedly evicts the victim's memory address containing the jump destination to ensure that when the victim does encounter the indirect branch it will take a long RAM access to resolve which allows the gadget to speculate ahead before the branch destination can be checked against the BTB entry and the pipeline flushed. In JavaScript, eviction is done by loading to the same cache set i.e. in multiples of 4096. The mistraining threads, eviction threads and victim threads are all running and looping at this stage. When the victim process loop calls Sleep()
, the indirect branch speculates to the gadget due to the IBTB entry that the hacker poisoned previously. A probing thread is launched with the complement of the victim process thread affinity (so as not to interfere with the mistraining and victim branch history). The probing thread will modify the header of the file that the victim process uses which results in those values residing in ebx
and edi
when Sleep()
is called meaning the probing thread can directly influence the values stored in ebx
and edi
. The spectre gadget branched to in the example adds the value stored at [ebx+edx+13BE13BDh]
to edi
and then loads a value at the address stored in edi
and adds it with a carry to dl
. This allows the probing thread to learn the value stored at [ebx+edx+13BE13BDh]
as if it selects an original edi
of 0 then the value accessed in the second operation will be loaded from the virtual address range 0x0 – 0x255 by which time the indirect branch will resolve but the side effects are already present. The attack process needs to ensure that it has mapped the same physical address into the same location in its virtual address space in order to probe the probing array with a timing attack. Not sure how it does this but in windows it would, AFAIK, need to map a view of a page-file–backed section object that has been opened by the victim at that location. Either that or it would manipulate the victim to call the spectre gadget with a negative TC ebx
value such that ebx+edx+13BE13BDh
= 0
, =1
,..., =255
and somehow time that call. This could potentially also be achieved by using APC injection.