I am getting into C/C++ and a lot of terms are popping up unfamiliar to me. One of them is a variable or pointer that is terminated by a zero. What does it mean for a space
Arrays and string in C is just a pointers to a memory location. By pointer you can find a start of array. The end of array is undefined. The end of character array (which is the string) is zero-byte.
So, in memory string hello is written as:
68 65 6c 6c 6f 00 |hello|
There are two common ways to handle arrays that can have varying-length contents (like Strings). The first is to separately keep the length of the data stored in the array. Languages like Fortran and Ada and C++'s std::string do this. The disadvantage to doing this is that you somehow have to pass that extra information to everything that is dealing with your array.
The other way, is to reserve an extra non-data element at the end of the array to serve as a sentinel. For the sentinel you use a value that should never appear in the actual data. For strings, 0 (or "NUL") is a good choice, as that is unprintable and serves no other purpose in ASCII. So what C (and many languages copied from C) do is to assume that all strings end (or "are terminated by") a 0.
There are several drawbacks to this. For one thing, it is slow. Any time a routine needs to know the length of the string, it is an O(n) operation (searching through the entire string looking for the 0). Another problem is that you may one day want to put a 0 in your string for some reason, so now you need a whole second set of string routines that ignore the null and use a separate length anyway (eg: strnlen() ). The third big problem is that if someone forgets to put that 0 at the end (or it gets wiped out somehow), the next string operation to do a lenth check will go merrily marching through memory until it either happens to randomly find another 0, crashes, or the user loses patience and kills it. Such bugs can be a serious PITA to track down.
For all these reasons, the C approach is generally viewed with disfavor.
C-style strings are terminated by a NUL character ('\0'). This provides a marker for functions that operate on strings (e.g. strlen, strcpy) to use to identify the end of the string.
Take the string Hi
in ASCII. Its simplest representation in memory is two bytes:
0x48
0x69
But where does that piece of memory end? Unless you're also prepared to pass around the number of bytes in the string, you don't know - pieces of memory don't intrinsically have a length.
So C has a standard that strings end with a zero byte, also known as a NUL
character:
0x48
0x69
0x00
The string is now unambiguously two characters long, because there are two characters before the NUL
.
It's a reserved value to indicate the end of a sequence of (for example) characters in a string.
More correctly known as null (or NUL) terminated. This is because the value used is zero, rather than being the character code for '0'. To clarify the distinction check out a table of the ASCII character set.
This is necessary because languages like C have a char
data type, but no string
data type. Therefore it is left to the devleoper to decide how to manage strings in their application. The usual way of doing this is to have an array of char
s with a null value used to terminate (i.e. signify the end of) the string.
Note that there is a distinction between the length of the string, and the length of the char array that was originally declared.
char name[50];
This declares an array of 50 characters. However, these values will be uninitialised. So if I want to store the string "Hello"
(5 characters long) I really don't want to bother setting the remaining 45 characters to spaces (or some other value). Instead I store a NUL value after the last character in my string.
More recent languages such as Pascal, Java and C# have a specific string
type defined. These have a header value to indicate the number of characters in the string. This has a couple of benefits; firstly you don't need to walk to the end of the string to find out its length, secondly your string can contain null characters.
Wikipedia has further information in the String (computer science) entry.
It refers to how C strings are stored in memory. The NUL character represented by \0 in string iterals is present at the end of a C string in memory. There is no other meta data associated with a C string like length for example. Note the different spelling between NUL character and NULL pointer.