In all the code I see online, programs are always broken up into many smaller files. For all of my projects for school though, I\'ve gotten by by just having one gigantic C sour
Well, that's exactly what you want : split your code in several libraries !
Let's take an example, in one file you have :
#include <stdio.h>
int something() {
return 42;
}
int bar() {
return something();
}
void foo(int i) {
printf("do something with %d\n", i);
}
int main() {
foo(bar());
return 0;
}
you can split this up to :
mylib.h:
#ifndef __MYLIB_H__
#define __MYLIB_H__
#include <stdio.h>
int bar();
void foo();
#endif
N.B.: the preprocessor code above is called a "guard" which is used to not run twice this header file, so you can call the same include at several places, and have no compilation error
mylib.c:
#include <mylib.h>
int something() {
return 42;
}
int bar() {
return something();
}
void foo(int i) {
printf("do something with %d\n", i);
}
myprog.c:
#include <mylib.h>
int main() {
foo(bar());
return 0;
}
to compile it you do :
gcc -c mylib.c -I./
gcc -o myprog myprog.c -I./ mylib.o
now the advantages ?
Easy of reading is one point of breaking up files, but another is that when you build a project containing multiple files (header and source files) a good build system will only rebuild the files that have been modified thereby shortening build-times.
As for how to break up a monolithic file into multiple files, there are many ways to go. Speaking for me, I would try to group functionality, so for example all input handling is put in one source file, output in another, and functions that are used by many different function in a third source file. I would do the same with structures/constants/macros, group related structures/etc. in separate header files. I would also mark functions used only in a single source file as static
, so they can't be used from other source files by mistake.
is it just for ease of reading?
The main reasons are
Maintainability: In large, monolithic programs like what you describe, there's a risk that changing code in one part of the file can have unintended effects somewhere else. Back at my first job, we were tasked with speeding up code that drove a 3D graphical display. It was a single, monolithic, 5000+-line main
function (not that big in the grand scheme of things, but big enough to be a headache), and every change we made broke an execution path somewhere else. This was badly written code all the way around (goto
s galore, literally hundreds of separate variables with incredibly informative names like nv001x
, program structure that read like old-school BASIC, micro-optimizations that didn't do anything but make the code that much harder to read, brittle as hell) but keeping it all in one file made the bad situation worse. We eventually gave up and told the customer we'd either have to rewrite the whole thing from scratch, or they'd have to buy faster hardware. They wound up buying faster hardware.
Reusability: There's no point in writing the same code over and over again. If you come up with a generally useful bit of code (like, say, an XML parsing library, or a generic container), keep it in its own separately compiled source files, and simply link it in when necessary.
Testability: Breaking functions out into their own separate modules allows you to test those functions in isolation from the rest of the code; you can verify each individual function more easily.
Buildability: Okay, so "buildability" isn't a real word, but rebuilding an entire system from scratch every time you change one or two lines can be time consuming. I've worked on very large systems where complete builds could take upwards of several hours. By breaking up your code, you limit the amount of code that has to be rebuilt. Not to mention that any compiler is going to have some limits on the size of the file it can handle. That graphical driver I mentioned above? The first thing we tried to do to speed it up was to compile it with optimizations turned on (starting with O1). The compiler ate up all available memory, then it ate all the available swap until the kernel panicked and brought down the entire system. We literally could not build that code with any optimization turned on (this was back in the days when 128 MB was a lot of very expensive memory). Had that code been broken up into multiple files (hell, just multiple functions within the same file), we wouldn't have had that problem.
Parallel Development: There isn't an "ability" word for this, but by breaking source up into multiple files and modules, you can parallelize development. I work on one file, you work on another, someone else works on a third, etc. We don't risk stepping on each other's code that way.
is it just for ease of reading?
No, it can also save you a lot of time compiling; when you change one source file, you only recompile that file, then relink, instead of recompiling everything. But the main point is dividing a program into a set of well-separated modules that are easier to understand and maintain than a single monolithic "blob".
For starters, try to adhere to Rob Pike's rule that "data dominates": design your program around a bunch of data structures (struct
's, usually) with operations on them. Put all the operations that belong to a single data structure into a separate module. Make all functions static
that need not be called by functions outside the module.
Just to give you an idea.
create a file called print.c, put this inside:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void print_on_stdout(const char *msg) {
if (msg) fprintf(stdout, "%s\n", msg);
}
void print_on_stderr(const char *msg) {
if (msg) fprintf(stderr, "%s\n", msg);
}
create a file called print.h, put this inside:
void print_on_stdout(const char *msg);
void print_on_stderr(const char *msg);
create a file called main.c, put this inside:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "print.h"
int main() {
print_on_stdout("test on stdout");
print_on_stderr("test on stderr");
return 0;
}
Now, for each C file, compile with:
gcc -Wall -O2 -o print.o -c print.c
gcc -Wall -O2 -o main.o -c main.c
Then link compiled files to generate an executable:
gcc -Wall -O2 -o test print.o main.o
Run ./test and enjoy.
Well, I am not an expert, but I always try to think in entities larger that a function. If I have a group of functions which logically belongs together, I put it into a separate file. Usually, if the functionality is similar, and someone wants one of such functions, he would probably need some other functions from this group as well.
The need to split up the single file comes from the same reason why you are using different folders for your files: people want to have some logical organization on the numerous functions, so that they don't need to grep the huge single source file for finding the needed one. This way you can forget about the irrelevant parts of the program when you are thinking about/developing some fixed part of it.
One more reason for splitting could be that you can hide some internal function from the rest of the code by not mentioning it in the header. This way explicitly separate the inner functions (which are needed only inside the .c
file) from the functions interesting to the outer "universe" of your program.
Some more higher-level languages have even extended the notion of "function belonging together" into "functions working on the same thing, presented as one entity" -- and called that a class.
Another historical reason for splitting is separate compilation feature. If your compiler is slow (this is often the case with C++, for example), splitting the code into several files means that if you modify only one location, the chances are high that only one file needs to be recompiled to pick up the changes. As the modern C compilers are not so slow in comparison to the typical processor speed, this may be not an issue for you.