I am currently writing a C program involving dealing with bytes. When it comes to bytes, I\'m really confused about the following questions.
Are characters
Yes, ASCII characters are stored by their value. But storing 'A' (65 = 0x41) may be different than storing 65 itself, and how it is done depends on your machine architecture. A char can be stored with a single byte, while a int will be at least 2 bytes (more commonly 4 bytes in modern machines), and so these may be stored differently.
It doesn't. We could have memory that equaled 0x41. The only way this is distinguished between 'A' and 65 is based on how you declared it to the compiler. In other words, if you declared the variable as an int, it will be treated as an int.
There are so few ASCII values that you are able to represent all the possibilities with less than 8 bits. Thus, using 16 bits to represent this would be a waste of memory. In today's systems, this isn't as big of an issue anymore, but on memory limited systems, you might want to use that extra byte for something else instead of wasted space.
More or less, yes. 1 will always be stored as 0000....1, so that the total number of binary digits there equals fills up the space for an int. So on an 8 bit system that will be a 00000000 and a 00000001 in two words, on a 16 bit system that will be 000000000000001 on one word.
Are characters stored in memory by their ascii codes? Say 'A' has anscii code 65. So it's stored in memory the same way as integer 65?
Yes, but a char
in C is a single byte, while an int
depends on the machine architecture.
If so, how does the machine distinguish a character and an integer?
Machine code doesn't care what the bytes in the memory represent. It's the job of the compiler to translate your code into machine instructions that do what your program does.
If characters are stored by ascii codes, an ascii code is an integer. An integer should occupy at least 2 bytes, how come a character only occupy 1 byte?
ASCII can fit in a single byte (which is the size of a char
). Dealing with non-ASCII text is more complicated in C. There's wchar_t
which is non-portable and many people consider it broken. C11 introduces char16_t
and char32_t
, which can be used for UTF-16 and UTF-32 respectively.
The last one is about integers on different architectures. On a 16-bit machine, if 1 is stored as 000...0001, then on a 32-bit machine, is 1 still stored the same way just adding 0 at the front?
This is mostly correct, but it also depends on the endianness of the architecture.