I am trying to learn C and am very confused already.
In the OOP languages i have used there exists the ability to perform method overloading, where the same function cou
C supports a type of function signature called "varargs" meaning "variable (number of) arguments". Such a function must have at least one required argument. In the case of printf
, the format string is a required argument.
Generally, on a stack-based machine, when you call any C function, the arguments are pushed onto the stack from right-to-left. In this way, the first argument to the function is that found on the "top" of the stack, just after the return address.
There are C macros defined which allow you to retrieve the variable arguments.
The key points are:
printf()
, if the format string is wrong, the code will read invalid results from memory, possibly crashing.va_start
, incremented with va_arg
, and released with va_end
.I have posted a ton of code you may find interesting on the related question:
Best Way to Store a va_list for Later Use in C/C++
Here's a skeleton of a printf()
which only formats integers ("%d"):
int printf( const char * fmt, ... )
{
int d; /* Used to store any int arguments. */
va_list args; /* Used as a pointer to the next variable argument. */
va_start( args, fmt ); /* Initialize the pointer to arguments. */
while (*fmt)
{
if ('%' == *fmt)
{
fmt ++;
switch (*fmt)
{
case 'd': /* Format string says 'd'. */
/* ASSUME there is an integer at the args pointer. */
d = va_arg( args, int);
/* Print the integer stored in d... */
break;
}
}
else
/* Not a format character, copy it to output. */
fmt++;
}
va_end( args );
}
Internally, printf
will (at least usually) use some macros from stdarg.h. The general idea is (a greatly expanded version of) something like this:
#include <stdarg.h>
#include <stdio.h>
#include <string.h>
int my_vfprintf(FILE *file, char const *fmt, va_list arg) {
int int_temp;
char char_temp;
char *string_temp;
char ch;
int length = 0;
char buffer[512];
while ( ch = *fmt++) {
if ( '%' == ch ) {
switch (ch = *fmt++) {
/* %% - print out a single % */
case '%':
fputc('%', file);
length++;
break;
/* %c: print out a character */
case 'c':
char_temp = va_arg(arg, int);
fputc(char_temp, file);
length++;
break;
/* %s: print out a string */
case 's':
string_temp = va_arg(arg, char *);
fputs(string_temp, file);
length += strlen(string_temp);
break;
/* %d: print out an int */
case 'd':
int_temp = va_arg(arg, int);
itoa(int_temp, buffer, 10);
fputs(buffer, file);
length += strlen(buffer);
break;
/* %x: print out an int in hex */
case 'x':
int_temp = va_arg(arg, int);
itoa(int_temp, buffer, 16);
fputs(buffer, file);
length += strlen(buffer);
break;
}
}
else {
putc(ch, file);
length++;
}
}
return length;
}
int my_printf(char const *fmt, ...) {
va_list arg;
int length;
va_start(arg, fmt);
length = my_vfprintf(stdout, fmt, arg);
va_end(arg);
return length;
}
int my_fprintf(FILE *file, char const *fmt, ...) {
va_list arg;
int length;
va_start(arg, fmt);
length = my_vfprintf(file, fmt, arg);
va_end(arg);
return length;
}
#ifdef TEST
int main() {
my_printf("%s", "Some string");
return 0;
}
#endif
Fleshing it out does involve quite a bit of work -- dealing with field width, precision, more conversions, etc. This is enough, however, to at least give a flavor of how you retrieve varying arguments of varying types inside your function.
(Don't forget that, if you're using gcc (and g++?), you can pass -Wformat
in the compiler options to get the compiler to check that the types of the arguments match the formatting. I hope other compilers have similar options.)
Could anyone here explain how C performs the above task?
Blind faith. It assumes that you have ensured that the types of the arguments match perfectly with the corresponding letters in your format string. When printf
is called, all the arguments are represented in binary, unceremoniously concatenated together, and passed effectively as a single big argument to printf
. If they don't match, you'll have problems. As printf
iterates through the format string, every time it see %d
it will take 4 bytes from the arguments (assuming 32-bit, it would be 8 bytes for 64-bit ints of course) and it will interpret them as an integer.
Now maybe you actually passed a double
(typically taking up twice as much memory as an int
), in which case printf
will just take 32 of those bits and represented them as an integer. Then the next format field (maybe a %d
) will take the rest of the double.
So basically, if the types don't match perfectly you'll get badly garbled data. And if you're unlucky you will have undefined behaviour.