What platforms have something other than 8-bit char?

前端 未结 12 1244
再見小時候
再見小時候 2020-11-22 00:31

Every now and then, someone on SO points out that char (aka \'byte\') isn\'t necessarily 8 bits.

It seems that 8-bit char is almost universal. I would h

相关标签:
12条回答
  • 2020-11-22 01:08

    char is also 16 bit on the Texas Instruments C54x DSPs, which turned up for example in OMAP2. There are other DSPs out there with 16 and 32 bit char. I think I even heard about a 24-bit DSP, but I can't remember what, so maybe I imagined it.

    Another consideration is that POSIX mandates CHAR_BIT == 8. So if you're using POSIX you can assume it. If someone later needs to port your code to a near-implementation of POSIX, that just so happens to have the functions you use but a different size char, that's their bad luck.

    In general, though, I think it's almost always easier to work around the issue than to think about it. Just type CHAR_BIT. If you want an exact 8 bit type, use int8_t. Your code will noisily fail to compile on implementations which don't provide one, instead of silently using a size you didn't expect. At the very least, if I hit a case where I had a good reason to assume it, then I'd assert it.

    0 讨论(0)
  • 2020-11-22 01:08

    ints used to be 16 bits (pdp11, etc.). Going to 32 bit architectures was hard. People are getting better: Hardly anyone assumes a pointer will fit in a long any more (you don't right?). Or file offsets, or timestamps, or ...

    8 bit characters are already somewhat of an anachronism. We already need 32 bits to hold all the world's character sets.

    0 讨论(0)
  • 2020-11-22 01:12

    For one, Unicode characters are longer than 8-bit. As someone mentioned earlier, the C spec defines data types by their minimum sizes. Use sizeof and the values in limits.h if you want to interrogate your data types and discover exactly what size they are for your configuration and architecture.

    For this reason, I try to stick to data types like uint16_t when I need a data type of a particular bit length.

    Edit: Sorry, I initially misread your question.

    The C spec says that a char object is "large enough to store any member of the execution character set". limits.h lists a minimum size of 8 bits, but the definition leaves the max size of a char open.

    Thus, the a char is at least as long as the largest character from your architecture's execution set (typically rounded up to the nearest 8-bit boundary). If your architecture has longer opcodes, your char size may be longer.

    Historically, the x86 platform's opcode was one byte long, so char was initially an 8-bit value. Current x86 platforms support opcodes longer than one byte, but the char is kept at 8 bits in length since that's what programmers (and the large volumes of existing x86 code) are conditioned to.

    When thinking about multi-platform support, take advantage of the types defined in stdint.h. If you use (for instance) a uint16_t, then you can be sure that this value is an unsigned 16-bit value on whatever architecture, whether that 16-bit value corresponds to a char, short, int, or something else. Most of the hard work has already been done by the people who wrote your compiler/standard libraries.

    If you need to know the exact size of a char because you are doing some low-level hardware manipulation that requires it, I typically use a data type that is large enough to hold a char on all supported platforms (usually 16 bits is enough) and run the value through a convert_to_machine_char routine when I need the exact machine representation. That way, the platform-specific code is confined to the interface function and most of the time I can use a normal uint16_t.

    0 讨论(0)
  • 2020-11-22 01:13

    A few of which I'm aware:

    • DEC PDP-10: variable, but most often 7-bit chars packed 5 per 36-bit word, or else 9 bit chars, 4 per word
    • Control Data mainframes (CDC-6400, 6500, 6600, 7600, Cyber 170, Cyber 176 etc.) 6-bit chars, packed 10 per 60-bit word.
    • Unisys mainframes: 9 bits/byte
    • Windows CE: simply doesn't support the `char` type at all -- requires 16-bit wchar_t instead
    0 讨论(0)
  • 2020-11-22 01:15

    It appears that you can still buy an IM6100 (i.e. a PDP-8 on a chip) out of a warehouse. That's a 12-bit architecture.

    0 讨论(0)
  • 2020-11-22 01:18

    The C and C++ programming languages, for example, define byte as "addressable unit of data large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values. Various implementations of C and C++ define a byte as 8, 9, 16, 32, or 36 bits

    Quoted from http://en.wikipedia.org/wiki/Byte#History

    Not sure about other languages though.

    http://en.wikipedia.org/wiki/IBM_7030_Stretch#Data_Formats

    Defines a byte on that machine to be variable length

    0 讨论(0)
提交回复
热议问题