How a standard library like libc.a (static library) which is included using #include
in our main.c differ from user defined header file (cube.h) inc
A programming language is not the same as its implementation.
A programming language is a specification (written on paper; you should read n1570, which practically is the C11 standard), it is not a software. The C standard specifies a C standard library and defines the headers to be #include
-d.
(you could run your C program with a bunch of human slaves and without any computers; that would be very unethical; you could also use some interpreter like Ch and avoid any compiler or object or executable files)
How a standard library like
libc.a
(static library) which is included using#include <stdio.h>
... differs from a user filecube.c
The above sentence is utterly wrong (and makes no sense). libc.a
does not #include
-or is not included by- the <stdio.h>
header (i.e. file /usr/include/stdio.h
and other internal headers e.g. /usr/include/bits/stdio2.h
). That inclusion happens when you compile your main.c
or cube.c
.
In principle, <stdio.h>
might not be any file on your computer (e.g. #include <stdio.h>
could trigger some magic in your compiler). In practice, the compiler is parsing /usr/include/stdio.h
(and other included files) when you #include <stdio.h>
.
Some standard headers (notably <setjmp.h>
, <stdreturn.h>
, <stdarg.h>
, ....) are specified by the standard but are implemented with the help of special builtins or attributes (that is "magic" things) of the GCC compiler.
The C standard knows about translation units.
Your GCC compiler processes source files (grossly speaking, implementing translation units) and starts with a preprocessing phase (processing #include and other directives and expanding macros). And gcc
runs not only the compiler proper (some cc1
) but also the assembler as
and the linker ld
(read Levine's Linkers and Loaders book for more).
For good reasons, your header file cube.h
should practically start with include guards. In your simplistic example they are probably useless (but you should get that habit).
You practically should almost always use gcc -Wall -Wextra -g
(to get all warnings and debug info). Read the chapter about Invoking GCC.
You may pass also -v
to gcc
to understand what programs (e.g. cc1
, ld
, as
) are actually run.
You may pass -H
to gcc
to understand what source files are included during preprocessing phase. You can also get the preprocessed form of cube.c
as the cube.i
file obtained with gcc -C -E cube.c > cube.i
and later look into that cube.i
file with some editor or pager.
You -or gcc
- would need (in your example) to compile cube.c
(the translation unit given by that file and every header files it is #include
-ing) into the cube.o
object file (assuming a Linux system). You would also compile main.c
into main.o
. At last gcc
would link cube.o
, main.o
, some startup files (read about crt0) and the libc.so
shared library (implementing the POSIX C standard library specification and a bit more) to produce an executable. Relocatable object files, shared libraries (and static libraries, if you use some) and executables use the ELF file format on Linux.
If you code a C program with several source files (and translation units) you practically should use a build automation tool like GNU make.
If I included include guards in cube.h what would happen when I include cube.h in both main.c and cube.c ?
These should be two different translation units. And you would compile them in several steps. First you compile main.c
into main.o
using
gcc -Wall -Wextra -g -c main.c
and the above command is producing a main.o
object file (with the help of cc1
and as
)
Then you compile (another translation unit) cube.c
using
gcc -Wall -Wextra -g -c cube.c
hence obtaining cube.o
(notice that adding include guards in your cube.h
don't change the fact that it would be read twice, once when compiling cube.c
and the other time when compiling main.c
)
At last you link both object files into yourprog
executable using
gcc -Wall -Wextra -g cube.o main.o -o yourprog
(I invite you to try all these commands, and also to try them with gcc -v
instead of gcc
above).
Notice that gcc -Wall -Wextra -g cube.c main.c -o yourprog
is running all the steps above (check with gcc -v
). You really should write a Makefile to avoid typing all these commands (and just compile using make
, or even better make -j
to run compilation in parallel).
Finally you can run your executable using ./yourprog
(but read about PATH), but you should learn how to use gdb
and try gdb ./yourprog
.
Where it
cube.h
will get included?
It will get included at both translation units; once when running gcc -Wall -Wextra -g -c main.c
and another time when running gcc -Wall -Wextra -g -c cube.c
. Notice that object files (cube.o
and main.o
) don't contain included headers. Their debug information (in DWARF format) retains that inclusion (e.g. the included path, not the content of the header file).
BTW, look into existing free software projects (and study some of their source code, at least for inspiration). You might look into GNU glibc or musl-libc to understand what a C standard library really contains on Linux (it is built above system calls, listed in syscalls(2), provided and implemented by the Linux kernel). For example printf
would ultimately sometimes use write(2) but it is buffering (see fflush(3)).
PS. Perhaps you dream of programming languages (like Ocaml, Go, ...) knowing about modules. C is not one.
TL;DR: the most crucial difference between the C standard library and your library function is that the compiler might intimately know what the standard library functions do without seeing their definition.
First of all, there are 2 kinds of libraries:
The C standard library (and possibly other libraries that are part of the C implementation, like libgcc
)
Any other libraries - which includes all those other libraries in /usr/lib
, /lib
, etc.., or those in your project.
The most crucial difference between a library in category 1 and a library in category 2 library is that the compiler is allowed to assume that every single identifier that you use from category 1 library behaves as if it is the standard library function and behaves as if in the standard and can use this fact to optimize things as it sees fit - this even without it actually linking against the relevant routine from the standard library, or executing it at the runtime. Look at this example:
% cat foo.c
#include <math.h>
#include <stdio.h>
int main(void) {
printf("%f\n", sqrt(4.0));
}
We compile it, and run:
% gcc foo.c -Wall -Werror
% ./a.out
2.000000
%
and correct result is printed out.
So what happens when we ask the user for the number:
% cat foo.c
#include <math.h>
#include <stdio.h>
int main(void) {
double n;
scanf("%lf\n", &n);
printf("%f\n", sqrt(n));
}
then we compile the program:
% gcc foo.c -Wall -Werror
/tmp/ccTipZ5Q.o: In function `main':
foo.c:(.text+0x3d): undefined reference to `sqrt'
collect2: error: ld returned 1 exit status
Surprise, it doesn't link. That is because sqrt
is in the math library -lm
and you need to link against it to get the definition. But how did it work in the first place? Because the C compiler is free to assume that any function from standard library behaves as if it was as written in the standard, so it can optimize all invocations to it out; this even when we weren't using any -O
switches.
Notice that it isn't even necessary to include the header. C11 7.1.4p2 allows this:
Provided that a library function can be declared without reference to any type defined in a header, it is also permissible to declare the function and use it without including its associated header.
Therefore in the following program, the compiler can still assume that the sqrt
is the one from the standard library, and the behaviour here is still conforming:
% cat foo.c
int printf(const char * restrict format, ...);
double sqrt(double x);
int main(void) {
printf("%f\n", sqrt(4.0));
}
% gcc foo.c -std=c11 -pedantic -Wall -Werror
% ./a.out
2.000000
If you drop the prototype for sqrt
, and compile the program,
int printf(const char * restrict format, ...);
int main(void) {
printf("%f\n", sqrt(4));
}
A conforming C99, C11 compiler must diagnose constraint violation for implicit function declaration. The program is now an invalid program, but it still compiles (the C standard allows that too). GCC still calculates sqrt(4)
at compilation time. Notice that we use int
here instead of double
, so it wouldn't even work at runtime without proper declaration for an ordinary function because without prototype the compiler wouldn't know that the argument must be double
and not the int
that was passed in (without a prototype, the compiler doesn't know that the int
must be converted to a double
). But it still works.
% gcc foo.c -std=c11 -pedantic
foo.c: In function ‘main’:
foo.c:4:20: warning: implicit declaration of function ‘sqrt’
[-Wimplicit-function-declaration]
printf("%f\n", sqrt(4));
^~~~
foo.c:4:20: warning: incompatible implicit declaration of built-in function ‘sqrt’
foo.c:4:20: note: include ‘<math.h>’ or provide a declaration of ‘sqrt’
% ./a.out
2.000000
This is because an implicit function declaration is one with external linkage, and C standard says this (C11 7.1.3):
[...] All identifiers with external linkage in any of the following subclauses (including the future library directions) and
errno
are always reserved for use as identifiers with external linkage. [...]
and Appendix J.2. explicitly lists as undefined behaviour:
[...] The program declares or defines a reserved identifier, other than as allowed by 7.1.4 (7.1.3).
I.e. if the program did actually have its own sqrt
then the behaviour is simply undefined, because the compiler can assume that the sqrt
is the standard-conforming one.