What is the use of unsigned char
pointers? I have seen it at many places that pointer is type cast to pointer to unsinged char
Why do we do so?
In C, unsigned char
is the only type guaranteed to have no trapping values, and which guarantees copying will result in an exact bitwise image. (C++ extends this guarantee to char
as well.) For this reason, it is traditionally used for "raw memory" (e.g. the semantics of memcpy
are defined in terms of unsigned char
).
In addition, unsigned integral types in general are used when bitwise operations (&
, |
, >>
etc.) are going to be used. unsigned char
is the smallest unsigned integral type, and may be used when manipulating arrays of small values on which bitwise operations are used. Occasionally, it's also used because one needs the modulo behavior in case of overflow, although this is more frequent with larger types (e.g. when calculating a hash value). Both of these reasons apply to unsigned types in general; unsigned char
will normally only be used for them when there is a need to reduce memory use.
The unsinged char
type is usually used as a representation of a single byte
of binary data. Thus, and array is often used as a binary data buffer, where each element is a singe byte.
The unsigned char*
construct will be a pointer to the binary data buffer (or its 1st element).
I am not 100% sure what does c++
standard precisely says about size of unsigned char
, whether it is fixed to be 8 bit or not. Usually it is. I will try to find and post it.
After seeing your code
When you use something like void* input
as a parameter of a function, you deliberately strip down information about inputs original type. This is very strong suggestion that the input will be treated in very general manner. I.e. as a arbitrary string of bytes. int* input
on the other hand would suggest it will be treated as a "string" of singed integers.
void*
is mostly used in cases when input gets encoded, or treated bit
/byte
wise for whatever reason, since you cannot draw conclusions about its contents.
Then In your function you seem to want to treat the input as a string of bytes. But to operate on objects, e.g. performing operator=
(assignment) the compiler needs to know what to do. Since you declare input as void*
assignment such as *input = something
would have no sense because *input
is of void
type. To make compiler to treat input
elements as the "smallest raw memory pieces" you cast it to the appropriate type which is unsigned int
.
The cout
probably did not work because of wrong or unintended type conversion. char*
is considered a null terminated string and it is easy to confuse singed
and unsigned
versionin code. If you pass unsinged char*
to ostream::operator<<
as a char*
it will treat and expect the byte
input as normal ASCII characters, where 0
is meant to be end of string not an integer value of 0
. When you want to print contents of memory it is best to explicitly cast pointers.
Also note that to print memory contents of a buffer you would need to use a loop, since other wise the printing function would not know when to stop.
You are actually looking for pointer arithmetic:
unsigned char* bytes = (unsigned char*)ptr;
for(int i = 0; i < size; i++)
// work with bytes[i]
In this example, bytes[i]
is equal to *(bytes + i)
and it is used to access the memory on the address: bytes + (i* sizeof(*bytes))
. In other words: If you have int* intPtr
and you try to access intPtr[1]
, you are actually accessing the integer stored at bytes: 4 to 7:
0 1 2 3
4 5 6 7 <--
The size of type your pointer points to affects where it points after it is incremented / decremented. So if you want to iterate your data byte by byte, you need to have a pointer to type of size 1 byte (that's why unsigned char*
).
unsigned char
is usually used for holding binary data where 0
is valid value and still part of your data. While working with "naked" unsigned char*
you'll probably have to hold the length of your buffer.
char
is usually used for holding characters representing string and 0
is equal to '\0'
(terminating character). If your buffer of characters is always terminated with '\0'
, you don't need to know it's length because terminating character exactly specifies the end of your data.
Note that in both of these cases it's better to use some object that hides the internal representation of your data and will take care of memory management for you (see RAII idiom). So it's much better idea to use either std::vector<unsigned char>
(for binary data) or std::string
(for string).
Unsigned char pointers are useful when you want to access the data byte by byte. For example, a function that copies data from one area to another could need this:
void memcpy (unsigned char* dest, unsigned char* source, unsigned count)
{
for (unsigned i = 0; i < count; i++)
dest[i] = source[i];
}
It also has to do with the fact that the byte is the smallest addressable unit of memory. If you want to read anything smaller than a byte from memory, you need to get the byte that contains that information, and then select the information using bit operations.
You could very well copy the data in the above function using a int
pointer, but that would copy chunks of 4 bytes, which may not be the correct behavior in some situations.
Why nothing appears on the screen when you try to use cout
, the most likely explanation is that the data starts with a zero character, which in C++ marks the end of a string of characters.