I have a lot of confusion on understanding the difference between a \"far\" pointer and \"huge\" pointer, searched for it all over in google for a solution, couldnot find on
As I recall, it's something like this:
If you're a beginner, it's probably best to forget that you heard about Near/Far/Huge. They only have meaning in the old 16-bit segmented memory model commonly seen on early Intel 80x86's. In 32- and 64-bit land (i.e., everything since 1994), memory is just a big contiguous block, so a pointer is just a pointer (as far as a single application is concerned).
First thing to understand is how a segmented pointer is converted into a linear address. For the example you have, the conversion is:
linear = segment * 16 + offset;
Because of that, it turns out there there the same linear address can be expressed using different segment/offset combinations. For example, the following segment/offset combinations all refer to the same linear address:
0004:0000
0003:0010
0002:0020
0001:0030
0000:0040
The problem with this is that if you have ptr1 with a segmented address of 0100:0000
and ptr2 with a segmented address of 0010:0020
, a simple comparison will determine that ptr1 != ptr2
even though they actually point to the same address.
Normalization is the process by which you convert an address to a form such that if two non-normalized pointers refer to the same linear address, they will both be converted to the same normalized form.
In the beginning 8086 was an extension of the 8 bit processor 8085. The 8085 could only address 65536 bytes with its 16 bit address bus. When Intel developed the 8086 they wanted the software to be as compatible as possible to the old 8 bit processors, so they introduced the concept of segmented memory addressing. This allowed to run 8 bit software to live in the bigger address range without noticing. The 8086 had a 20 bit address bus and could thus handle up to 1 MB of memory (2^20). Unfortunatly it could not address this memory directly, it had to use the segment registers to do that. The real address was calculated by adding the 16 bit segment value shifted by 4 to the left added to the 16 bit offset.
Example:
Segment 0x1234 Offset 0x5678 will give the real address
0x 1234
+0x 5678
---------
=0x 179B8
As you will have noticed, this operation is not bijective, meaning you can generate the real address with other combinations of segment and offset.
0x 1264 0x 1111
+0x 5378 +0x 68A8
--------- --------- etc.
=0x 179B8 =0x 179B8
There are in fact 4096 different combinations possible, because of the 3 overlapping nibbles (3*4 = 12
bits, 2^12 = 4096
) .
The normalized combination is the only one in 4096 possible values that will have the 3 high nibbles of the offset to zero. In our example it will be:
0x 179B
+0x 0008
---------
=0x 179B8
The difference between a far
and a huge
pointer is not in the normalisation, you can have non normalised huge
pointer, it's absolutly allowed. The difference is in the code generated when performing pointer arithmetic. With far pointers when incrementing or adding values to the pointer there will be no overflow handling and you will be only able to handle 64K of memory.
char far *p = (char far *)0x1000FFFF;
p++;
printf("p=%p\n");
will print 1000:0000
For huge pointers the compiler will generate the code necessary to handle the carry over.
char huge *p = (char huge *)0x1000FFFF;
p++;
printf("p=%p\n");
will print 2000:0000
This means you have to be careful when using far or huge pointers as the cost of the arithmetic with them is different.
One should also not forget that most 16 bit compilers had libraries that didn't handle these cases correctly giving sometimes buggy software.
Microsofts real mode compiler didn't handle huge pointers on all its string functions. Borland was even worse as even the mem functions (memcpy
, memset
, etc.) didn't handle offset overflows. That was the reason why it was a good idea to use normalised pointers with these library functions, the likelyhood of offset overflows was lower with them.