Why Compile to an Object File First?

In the last year I've started programming in Fortran working at a research university. Most of my prior experience is in web languages like PHP or old ASP, so I'm a newbie to compile statements.

I have two different code I'm modifying.

One has an explicit statement creating .o files from modules (e.g. gfortran -c filea.f90) before creating the executable.

Another are creating the executable file directly (sometimes creating .mod files, but no .o files, e.g. gfortran -o executable filea.f90 fileb.f90 mainfile.f90).

Is there a reason (other than, maybe, Makefiles) that one method is preferred over the other?

kriss

Compiling to object files first is called separate compilation. There are many advantages and a few drawbacks.

Advantages:

easy to transform object files (.o) to libraries and link to them later
many people can work on different source files at the same time
faster compiling (you don't compile the same files again and again when the source hasn't changed)
object files can be made from different language sources and linked together at some later time. To do that, the object files just have to use the same format and compatible calling conventions.
separate compilation enables distribution of system wide libraries (either OS libraries, language standard libraries or third party libraries) either static or shared.

Drawbacks:

There are some optimizations (like optimizing functions away) that the compiler cannot perform, and the linker does not care about; however, many compilers now include the option to perform "link time optimization", which largely negates this drawback. But this is still an issue for system libraries and third party libraries, especially for shared libraries (impossible to optimize away parts of a component that may change at each run, however other techniques like JIT compilation may mitigate this).
in some languages, the programmer has to provide some kind of header for the use of others that will link with this object. For example in C you have to provide .h files to go with your object files. But it is good practice anyway.
in languages with text based includes like C or C++, if you change a function prototype, you have to change it in two places. Once in header file, once in the implementation file.

When you have a project with a few 100 source files, you don't want to recompile all of them every time one changes. By compiling each source file into a separate object file and only recompile those source files that are affected by a change, you spend the minimum amount of time from source code change to new executable.

make is the common tool used to track such dependencies and recreate your binary when something changes. Typically you set up what each source file depends on (these dependencies can typically be generated by your compiler - in a format suitable for make), and let make handle the details of creating an up to date binary.

The .o file is the Object File. It's an intermediate representation of the final program.

Specifically, typically, the .o file has compiled code, but what it does not have is final addresses for all of the different routines or data.

One of the things that a program needs before it can be run is something similar to a memory image.

For example.

If you have your main program and it calls a routine A. (This is faux fortran, I haven't touched in decades, so work with me here.)

PROGRAM MAIN
INTEGER X,Y
X = 10
Y = SQUARE(X)
WRITE(*,*) Y
END

Then you have the SQUARE function.

FUNCTION SQUARE(N)
SQUARE = N * N
END

The are individually compiled units. You can see than when MAIN is compiled it does not KNOW where "SQUARE" is, what address it is at. It needs to know that so when it calls the microprocessors JUMP SUBROUTINE (JSR) instruction, the instruction has someplace to go.

The .o file has the JSR instruction already, but it doesn't have the actual value. That comes later in the linking or loading phase (depending on your application).

So, MAINS .o file has all of the code for main, and a list of references that it wants to resolved (notably SQUARE). SQUARE is basically stand alone, it doesn't have any references, but at the same time, it had no address as to where it exists in memory yet.

The linker will take all off the .o files and combine them in to a single exe. In the old days, compiled code would literally be a memory image. The program would start at some address and simply loaded in to RAM wholesale, and then executed. So, in the scenario, you can see the linker taking the two .o files, concatenating them together (to get SQUAREs actual address), then it would go back and find the SQUARE reference in MAIN, and fill in the address.

Modern linkers don't go quite that far, and defer much of that final processing to when the program is actually loaded. But the concept is similar.

By compiling to .o files, you end up with reusable units of logic that are then combined later by the linking and loading processes before execution.

The other nice aspect is that the .o files can come from different languages. As long as the calling mechanisms are compatible (i.e. how are arguments passed to and from functions and procedures), then once compiled in to a .o, the source language becomes less relevant. You can link, combine, C code with FORTRAN code, say.

In PHP et all, the process is different because all of the code is loaded in to a single image at runtime. You can consider the FORTRANs .o files similar to how you would use PHPs include mechanism to combine files in to a large, cohesive whole.

Another reason, apart from compile time, is that the compilation process is a multi-step process.

The object files are just one intermediate output from that process. They will eventually be used by the linker to produce the executable file.

We compile to object files to be able to link them together to form larger executables. That is not the only way to do it.

There are also compilers that don't do it that way, but instead compiles to memory and executes the result immediately. Earlier, when students had to use mainframes, this was standard. Turbo Pascal also did it this way.

来源：https://stackoverflow.com/questions/5283841/why-compile-to-an-object-file-first

标签

c++

compilation

fortran

object-files