问题
After all of this time, I've never thought to ask this question; I understand this came from c++, but what was the reasoning behind it:
- Specify decimal numbers as you normally would
- Specify octal numbers by a leading 0
- Specify hexadecimal numbers by a leading 0x
Why 0? Why 0x? Is there a natural progression for base-32?
回答1:
C, the ancestor of C++ and Java, was originally developed by Dennis Richie on PDP-8s in the early 70s. Those machines had a 12-bit address space, so pointers (addresses) were 12 bits long and most conveniently represented in code by three 4-bit octal digits (first addressable word would be 000octal, last addressable word 777octal).
Octal does not map well to 8 bit bytes because each octal digit represents three bits, so there will always be excess bits representable in the octal notation. An all-TRUE-bits byte (1111 1111) is 377 in octal, but FF in hex.
Hex is easier for most people to convert to and from binary in their heads, since binary numbers are usually expressed in blocks of eight (because that's the size of a byte) and eight is exactly two Hex digits, but Hex notation would have been clunky and misleading in Dennis' time (implying the ability to address 16 bits). Programmers need to think in binary when working with hardware (for which each bit typically represents a physical wire) and when working with bit-wise logic (for which each bit has a programmer-defined meaning).
I imagine Dennis added the 0 prefix as the simplest possible variation on everyday decimal numbers, and easiest for those early parsers to distinguish.
I believe Hex notation 0x__ was added to C slightly later. The compiler parse tree to distinguish 1-9 (first digit of a decimal constant), 0 (first [insignificant] digit of an octal constant), and 0x (indicating a hex constant to follow in subsequent digits) from each other is considerably more complicated than just using a leading 0 as the indicator to switch from parsing subsequent digits as octal rather than decimal.
Why did Dennis design this way? Contemporary programmers don't appreciate that those early computers were often controlled by toggling instructions to the CPU by physically flipping switches on the CPUs front panel, or with a punch card or paper tape; all environments where saving a few steps or instructions represented savings of significant manual labor. Also, memory was limited and expensive, so saving even a few instructions had a high value.
In summary: 0 for octal because it was efficiently parseable and octal was user-friendly on PDP-8s (at least for address manipulation)
0x for hex probably because it was a natural and backward-compatible extension on the octal prefix standard and still relatively efficient to parse.
回答2:
The zero prefix for octal, and 0x for hex, are from the early days of Unix.
The reason for octal's existence dates to when there was hardware with 6-bit bytes, which made octal the natural choice. Each octal digit represents 3 bits, so a 6-bit byte is two octal digits. The same goes for hex, from 8-bit bytes, where a hex digit is 4 bits and thus a byte is two hex digits. Using octal for 8-bit bytes requires 3 octal digits, of which the first can only have the values 0, 1, 2 and 3 (the first digit is really 'tetral', not octal). There is no reason to go to base32 unless somebody develops a system in which bytes are ten bits long, so a ten-bit byte could be represented as two 5-bit "nybbles".
回答3:
“New” numerals had to start with a digit, to work with existing syntax.
Established practice had variable names and other identifiers starting with a letter (or a few other symbols, perhaps underscore or dollar sign). So “a”, “abc”, and “a04” are all names. Numbers started with a digit. So “3” and “3e5” are numbers.
When you add new things to a programming language, you seek to make them fit into the existing syntax, grammar, and semantics, and you try to make existing code continue working. So, you would not want to change the syntax to make “x34” a hexadecimal number or “o34” an octal number.
So, how do you fit octal numerals into this syntax? Somebody realized that, except for “0”, there is no need for numerals beginning with “0”. Nobody needs to write “0123” for 123. So we use a leading zero to denote octal numerals.
What about hexadecimal numerals? You could use a suffix, so that “34x” means 3416. However, then the parser has to read all the way to the end of the numeral before it knows how to interpret the digits (unless it encounters one of the “a” to “f” digits, which would of course indicate hexadecimal). It is “easier” on the parser to know that the numeral is hexadecimal early. But you still have to start with a digit, and the zero trick has already been used, so we need something else. “x” was picked, and now we have “0x” for hexadecimal.
(The above is based on my understanding of parsing and some general history about language development, not on knowledge of specific decisions made by compiler developers or language committees.)
回答4:
I dunno ...
0 is for 0ctal
0x is for, well, we've already used 0 to mean octal and there's an x in hexadecimal so bung that in there too
as for natural progression, best look to the latest programming languages which can affix subscripts such as
123_27 (interpret _ to mean subscript)
and so on
?
Mark
回答5:
Is there a natural progression for base-32?
This is part of why Ada uses the form 16# to introduce hex constants, 8# for octal, 2# for binary, etc.
I wouldn't concern myself too much over needing space for "future growth" in basing though. This isn't like RAM or addressing space where you need an order of magnitude more every generation.
In fact, studies have shown that octal and hex are pretty much the sweet spot for human-readable representations that are binary-compatible. If you go any lower than octal, it starts to require a rediculous number of digits to represent larger numbers. If you go any higher than hex, the math tables get rediculously large. Hex is actually a bit too much already, but Octal has the problem that it doesn't evenly fit in a byte.
回答6:
There is a standard encoding for Base32. It is very similar to Base64. But it isn't very convenient to read. Hex is used because 2 hex digits can be used to represent 1 8-bit byte. And octal was used primarily for older systems that used 12-bit bytes. It made for a more compact representation of data when compared to displaying raw registers as binary.
It should also be noted that some languages use o### for octal and x## or h## for hex, as well as, many other variations.
回答7:
I think it 0x
actually came for the UNIX/Linux world and was picked-up by C/C++ and other languages. But I don't know the exact reason or true origin.
来源:https://stackoverflow.com/questions/1835465/where-did-the-octal-hex-notations-come-from