An answer (see below) to one of the questions right here on Stack Overflow gave me an idea for a great little piece of software that could be in
What can be super beneficial on even a single-core machine is parallel make. Disk I/O is a pretty large factor in the build process. Spawning two compiler instances per CPU core can actually increase performance. As one compiler instance blocks on I/O the other one can usually jump into the CPU intensive part of compiling.
You need to make sure you've got the RAM to support this (shouldn't be a problem on a modern workstation), otherwise you'll end up swapping and that defeats the purpose.
On GNU make you can just use -j[n]
where [n]
is the number of simultaneous processes to spawn. Make sure you have your dependency tree right before trying it though or the results can be unpredictable.
Another tool that's really useful (in the parallel make fashion) is distcc. It works a treat with GCC (if you can use GCC or something with a similar command line interface). distcc actually breaks up the compile task by pretending to be the compiler and spawning tasks on remote servers. You call it in the same way as you'd call GCC, and you take advantage of make's -j[n] option to call many distcc processes.
At one of my previous jobs we had a fairly intensive Linux operating system build that was performed almost daily for a while. Adding in a couple of dedicated build machines and putting distcc on a few workstations to accept compile jobs allowed us to bring build times down from a half a day to under 60 minutes for a complete OS + userspace build.
There's a lot of other tools to speed compiles existing. You might want to investigate more than creating RAM disks; something which looks like it will have very little gain since the OS is doing disk caching with RAM. OS designers spend a lot of time getting caching right for most workloads; they are (collectively) smarter than you, so I wouldn't like to try and do better than them.
If you chew up RAM for RAM disk, the OS has less working RAM to cache data and to run your code -> you'll end up with more swapping and worse disk performance than otherwise (note: you should profile this option before completely discarding it).
The disk slowdown you incur is mainly write, and also possibly due to virus scanners. It can vary greatly between OSes too.
With the idea that writes are slowest, I would be tempted to setup a build where intermediate (for example, .o
files) and binaries get output to a different location such as a RAM drive.
You could then link this bin/intermediate folder to faster media (using a symbolic link or NTFS junction point).
See Speeding up emerge with tmpfs (Gentoo Linux wiki).
Speeding up compiles using RAM drives under Gentoo was the subject of a how-to written many eons ago. It provides a concrete example of what has been done. The gist is that all source and build intermediate file are redirected to a RAM disk for compile, while final binaries are directed to the hard drive for install.
Also, I recommend exploring maintaining your source on hard drive, but git push
your latest source changes to a clone respository that resides on the RAM disk. Compile the clone. Use your favorite script to copy the binaries created.
I hope that helps.
I'm surprised at how many people suggest that the OS can do a better job at figuring out your caching needs than you can in this specialized case. While I didn't do this for compiling, I did do it for similar processes and I ended up using a RAM disk with scripts that automated the synchronization.
In this case, I think I'd go with a modern source control system. At every compile it would check in the source code (along an experimental branch if needed) automatically so that every compile would result in the data being saved off.
To start development, start the RAM disk and pull the current base line. Do the editing, compile, edit, compile, etc. - all the while the edits are being saved for you.
Do the final check in when happy, and you don't even have to involve your regular hard disk drive.
But there are background synchronizers that will automate things - the issue is that they won't be optimized for programming either and may need to do full directory and file scans occasionally to catch changes. A source code control system is designed for exactly this purpose, so it would likely be lower overhead even though it exists in your build setup.
Keep in mind that a background sync task, in the case of a power outage, is undefined. You would end up having to figure out what was saved and what wasn't saved if things went wrong. With a defined save point (at each compile, or forced by hand) you'd have a pretty good idea that it was at least in a state where you thought you could compile it. Use a VCS and you can easily compare it to the previous code and see what changes you've applied already.
We used to do this years ago for a 4GL macro-compiler; if you put the macro library and support libraries and your code on a RAM disk, compiling an application (on an 80286) would go from 20 minutes to 30 seconds.
Your OS will cache things in memory as it works. A RAM disk might seem faster, but that's because you aren't factoring in the "copy to RAMDisk" and "copy from RAMDisk" times. Dedicating RAM to a fixed size ramdisk just reduces the memory available for caching. The OS knows better what needs to be in RAM.