I want to understand exactly which part of a program compiler looks at and which the linker looks at. So I wrote the following code:
#include
I believe this is your question:
Where I get confused is when compiler complained about DefinedIncorrectFunction. It didn't look for implementation of NonDefinedFunction but it went through the DefinedIncorrectFunction.
The compiler tried to parse DefinedIncorrectFunction
(because you provided a definition in this source file) and there was a syntax error (missing semicolon). On the other hand, the compiler never saw a definition for NonDefinedFunction
because there simply was no code in this module. You might have provided a definition of NonDefinedFunction
in another source file, but the compiler doesn't know that. The compiler only looks at one source file (and its included header files) at a time.
The linker is there to link in code defined (possibly) in external modules - libraries or object files you will use together with this particular source file to generate the complete executable. So, if you have a declaration but no definition, your code will compile because the compiler knows the linker might find the missing code somewhere else and make it work. Therefore, in this case you will get an error from the linker, not the compiler.
If, on the other hand, there's a syntax error in your code, the compiler can't even compile and you will get an error at this stage. Macros and templates may behave a bit differently yet, not causing errors if they are not used (templates are about as much as macros with a somewhat nicer interface), but it also depends on the error's gravity. If you mess up so much that the compiler can't figure it out where the templated/macro code ends and regular code starts, it won't be able to compile.
With regular code, the compiler must compile even dead code (code not referenced in your source file) because someone might want to use that code from another source file, by linking your .o file to his code. Therefore non-templated/macro code must be syntactically correct even if it is not directly used in the same source file.
Compiler checks that the source code is language conformant and adheres to the semantics of the language. The output from compiler is object code.
Linker links the different object modules together to form a exe. The definitions of functions are located in this phase and the appropriate code to call them is added in this phase.
The compiler compiles code in the form of translation units. It will compile all the code that is included in a source .cpp
file,
DefinedIncorrectFunction()
is defined in your source file, So compiler checks it for language validity.
NonDefinedFunction()
does have any definition in the source file so the compiler does not need to compile it, if the definition is present in some other source file, the function will be compiled as a part of that translation unit and further the linker will link to it, if at linking stage the definition is not found by the linker then it will raise a linking error.
Ah, but you could have NonDefinedFunction(int) in another compilation unit.
The compiler produces some output for the linker that basically says the following (among other things):
What the compiler does, and what the linker does, depends on the implementation: a legal implementation could just store the tokenized source in the “compiler”, and do everything in the linker. Modern implementations do put off more and more to the linker, for better optimization. And many early implementations of templates didn't even look the template code until link time, other than matching braces enough to know where the template ended. From a user point of view, you're more interested in whether the error “requires a diagnostic” (which can be emitted by the compiler or the linker) or is undefined behavior.
In the case of DefinedIncorrectFunction
, you have provides source text
which the implementation is required to parse. That text contains a
error for which a diagnostic is required. In the case of
NonDefinedFunction
: if the function is used, failure to provide a
definition (or providing more than one definition) in the complete
program is a violation of the one definition rule, which is undefined
behavior. No diagnostic is required (but I can't imagine an
implementation that didn't provide one for a missing definition of a
function that was used).
In practice, errors which can be easily detected simply by examining the text input of a single translation unit are defined by the standard to “require a diagnostic”, and will be detected by the compiler. Errors which cannot be detected by the examination of a single translation unit (e.g. a missing definition, which might be present in a different translation unit) are formally undefined behavior—in many cases, the errors can be detected by the linker, and in such cases, implementations will in fact emit an error.
This is somewhat modified in cases like inline functions, where you're
allowed to repeat the definition in each translation unit, and extremely
modified by templates, since many errors cannot be detected until
instantiation. In the case of templates, the standard leaves
implementations a great deal of freedom: at the least, the compiler must
parse the template enough to determine where the template ends. The
standard added things like typename
, however, to allow much more
parsing before instantiation. In dependent contexts, however, some
errors cannot possibly be detected before instantiation, which may take
place at compilation time or at link time—early implementations
favored link time instantiation; compile time instantiation dominates
today, and is used by VC++ and g++.
The function of the compiler is to compile the code that you have written and convert it into object files. So if you have missed a ;
or used an undefined variable, the compiler will complain because these are syntax errors.
If the compilation proceeds without any hitch, the object files are produced. The object files have a complex structure but basically contain five things
The compiler compiles the code and fills the symbol table with every symbol it encounters. Symbols refers to both variables and functions. The answer to This question explains the symbol table.
This contains a collection of executable code and data that the linker can process into a working application or shared library. The object file has a data structure called a symbol table in it that maps the different items in the object file to names that the linker can understand.
The point to note
If you call a function from your code, the compiler doesn't put the final address of the routine in the object file. Instead, it puts a placeholder value into the code and adds a note that tells the linker to look up the reference in the various symbol tables from all the object files it's processing and stick the final location there.
The generated object files are processed by the linker that will fill out the blanks in symbol tables, link one module to the other and finally give the executable code which can be loaded by the loader.
So in your specific case -
undefined reference to NonDefinedFunction
error because it can't find the reference to the concerned symbol table entry.To understand it further lets say your code is structured as following
File- try.h
#include<string>
#include<iostream>
class Test {
private:
int i;
public:
Test(int val) {i=val ;}
void DefinedCorrectFunction(int val);
void DefinedIncorrectFunction(int val);
void NonDefinedFunction(int val);
template <class paramType>
void FunctionTemplate (paramType val) { i = val; }
};
File try.cpp
#include "try.h"
void Test::DefinedCorrectFunction(int val)
{
i = val;
}
void Test::DefinedIncorrectFunction(int val)
{
i = val;
}
int main()
{
Test testObject(1);
testObject.NonDefinedFunction(2);
//testObject.FunctionTemplate<int>(2);
return 0;
}
Let us first only copile and assemble the code but not link it
$g++ -c try.cpp -o try.o
$
This step proceeds without any problem. So you have the object code in try.o. Let's try and link it up
$g++ try.o
try.o: In function `main':
try.cpp:(.text+0x52): undefined reference to `Test::NonDefinedFunction(int)'
collect2: ld returned 1 exit status
You forgot to define Test::NonDefinedFunction. Let's define it in a separate file.
File- try1.cpp
#include "try.h"
void Test::NonDefinedFunction(int val)
{
i = val;
}
Let us compile it into object code
$ g++ -c try1.cpp -o try1.o
$
Again it is successful. Let us try to link only this file
$ g++ try1.o
/usr/lib/gcc/x86_64-redhat-linux/4.4.5/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
No main so won';t link!!
Now you have two separate object codes that have all the components you need. Just pass BOTH of them to linker and let it do the rest
$ g++ try.o try1.o
$
No error!! This is because the linker finds definitions of all the functions (even though it is scattered in different object files) and fills the blanks in object codes with appropriate values