Using the extra 16 bits in 64-bit pointers

随声附和 提交于 2019-11-26 18:00:58

The high order bits are reserved in case the address bus would be increased in the future, so you can't use it simply like that

The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations (...) The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (264 bytes). This is compared to just 4 GB (232 bytes) for the x86.

http://en.wikipedia.org/wiki/X86-64#Architectural_features

More importantly, according to the same article [Emphasis mine]:

... in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). Further, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form."

As the CPU will check the high bits even if they're unused, they're not really "irrelevant". You need to make sure that the address is canonical before using the pointer. Some other 64-bit architectures like ARM64 have the option to ignore the high bits, therefore you can store data in pointers much more easily.


That said, in x86_64 you're still free to use the high 16 bits if needed, but you have to check and fix the pointer value by sign-extending before dereferencing it.

Note that casting the pointer value to long is not the correct way to do because long is not guaranteed to be wide enough to store pointers. You need to use uintptr_t or intptr_t.

int *p1 = &val; // original pointer
uint8_t data = ...;
const uintptr_t MASK = ~(1ULL << 48);

// store data into the pointer
//     note: to be on the safe side and future-proof (because future implementations could
//     increase the number of significant bits in the pointer), we should store values
//     from the most significant bits down to the lower ones
int *p2 = (int *)(((uintptr_t)p1 & MASK) | (data << 56));

// get the data stored in the pointer
data = (uintptr_t)p2 >> 56;

// deference the pointer
//     technically implementation defined. You may want a more
//     standard-compliant way to sign-extend the value
intptr_t p3 = ((intptr_t)p2 << 16) >> 16; // sign extend the pointer to make it canonical
val = *(int*)p3;

WebKit's JavaScriptCore and Mozilla's SpiderMonkey engine use this in the nan-boxing technique. If the value is NaN, the low 48-bits will store the pointer to the object with the high 16 bits serve as tag bits, otherwise it's a double value.


You can also use the lower bits to store data. It's called a tagged pointer. If int is 4-byte aligned then the 2 low bits are always 0 and you can use them like in 32-bit architectures. For 64-bit values you can use the 3 low bits because they're already 8-byte aligned. Again you also need to clear those bits before dereferencing.

int *p1 = &val; // the pointer we want to store the value into
int tag = 1;
const uintptr_t MASK = ~0x03ULL;

// store the tag
int *p2 = (int *)(((uintptr_t)p1 & MASK) | tag);

// get the tag
tag = (uintptr_t)p2 & 0x03;

// get the referenced data
intptr_t p3 = (uintptr_t)p2 & MASK; // clear the 2 tag bits before using the pointer
val = *(int*)p3;

One famous user of this is the 32-bit version of V8 with SMI (small integer) optimization (I'm not sure about 64-bit V8 though). The lowest bits will serve as a tag for type: if it's 0, it's a small 31-bit integer, do a signed right shift by 1 to restore the value; if it's 1, the value is a pointer to the real data (objects, floats or bigger integers), just clear the tag and dereference it

Side note: Using linked list for cases with tiny key values compared to the pointers is a huge memory waste, and it's also slower due to bad cache locality. In fact you shouldn't use linked list in most real life problems

Paweł Dziepak

According to the Intel Manuals (volume 1, section 3.3.7.1) linear addresses has to be in the canonical form. This means that indeed only 48 bits are used and the extra 16 bits are sign extended. Moreover, the implementation is required to check whether an address is in that form and if it is not generate an exception. That's why there is no way to use those additional 16 bits.

The reason why it is done in such way is quite simple. Currently 48-bit virtual address space is more than enough (and because of the CPU production cost there is no point in making it larger) but undoubtedly in the future the additional bits will be needed. If applications/kernels were to use them for their own purposes compatibility problems will arise and that's what CPU vendors want to avoid.

Physical memory is 48 bit addressed. That's enough to address a lot of RAM. However between your program running on the CPU core and the RAM is the memory management unit, part of the CPU. Your program is addressing virtual memory, and the MMU is responsible for translating between virtual addresses and physical addresses. The virtual addresses are 64 bit.

The value of a virtual address tells you nothing about the corresponding physical address. Indeed, because of how virtual memory systems work there's no guarantee that the corresponding physical address will be the same moment to moment. And if you get creative with mmap() you can make two or more virtual addresses point at the same physical address (wherever that happens to be). If you then write to any of those virtual addresses you're actually writing to just one physical address (wherever that happens to be). This sort of trick is quite useful in signal processing.

Thus when you tamper with the 48th bit of your pointer (which is pointing at a virtual address) the MMU can't find that new address in the table of memory allocated to your program by the OS (or by yourself using malloc()). It raises an interrupt in protest, the OS catches that and terminates your program with the signal you mention.

If you want to know more I suggest you Google "modern computer architecture" and do some reading about the hardware that underpins your program.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!