I\'m running the following kind of pipeline:
digestA: hugefileB hugefileC
cat $^ > $@
rm $^
hugefileB:
touch $@
hugefileC:
touch $@
<
Use "intermediate files" feature of GNU Make:
Intermediate files are remade using their rules just like all other files. But intermediate files are treated differently in two ways.
The first difference is what happens if the intermediate file does not exist. If an ordinary file b does not exist, and make considers a target that depends on b, it invariably creates b and then updates the target from b. But if b is an intermediate file, then make can leave well enough alone. It won't bother updating b, or the ultimate target, unless some prerequisite of b is newer than that target or there is some other reason to update that target.
The second difference is that if make does create b in order to update something else, it deletes b later on after it is no longer needed. Therefore, an intermediate file which did not exist before make also does not exist after make. make reports the deletion to you by printing a
rm -f
command showing which file it is deleting.Ordinarily, a file cannot be intermediate if it is mentioned in the makefile as a target or prerequisite. However, you can explicitly mark a file as intermediate by listing it as a prerequisite of the special target
.INTERMEDIATE
. This takes effect even if the file is mentioned explicitly in some other way.You can prevent automatic deletion of an intermediate file by marking it as a secondary file. To do this, list it as a prerequisite of the special target
.SECONDARY
. When a file is secondary, make will not create the file merely because it does not already exist, but make does not automatically delete the file. Marking a file as secondary also marks it as intermediate.
So, adding the following line to the Makefile should be enough:
.INTERMEDIATE : hugefileB hugefileC
Invoking make for the first time:
$ make
touch hugefileB
touch hugefileC
cat hugefileB hugefileC > digestA
rm hugefileB hugefileC
And the next time:
$ make
make: `digestA' is up to date.
The correct way is to not delete the files, as that removes the information that make
uses to determine whether to rebuild the files.
Recreating them as empty does not help because make
will then assume that the empty files are fully built.
If there is a way to merge digests, then you could create one from each of the huge files, which is then kept, and the huge file automatically removed as it is an intermediate.
I would recommend you to create pseudo-cache files that are created by the hugefileB
and hugeFileC
targets.
Then have digestA
depend on those cache files, because you know they will not change again until you manually invoke the expensive targets.
If you mark hugefileB
and hugefileC
as intermediate files, you will get the behavior you want:
digestA: hugefileB hugefileC
cat $^ > $@
hugefileB:
touch $@
hugefileC:
touch $@
.INTERMEDIATE: hugefileB hugefileC
For example:
$ gmake
touch hugefileB
touch hugefileC
cat hugefileB hugefileC > digestA
rm hugefileB hugefileC
$ gmake
gmake: `digestA' is up to date.
$ rm -f digestA
$ gmake
touch hugefileB
touch hugefileC
cat hugefileB hugefileC > digestA
rm hugefileB hugefileC
Note that you do not need the explicit rm $^
command anymore -- gmake automatically deletes intermediate files at the end of the build.
See also .PRECIOUS:
.PRECIOUS : hugefileA hugefileB
.PRECIOUS
The targets which .PRECIOUS depends on are given the following special treatment: if make is killed or interrupted during the execution of their recipes, the target is not deleted. See Interrupting or Killing make. Also, if the target is an intermediate file, it will not be deleted after it is no longer needed, as is normally done. See Chains of Implicit Rules. In this latter respect it overlaps with the .SECONDARY special target.
You can also list the target pattern of an implicit rule (such as ‘%.o’) as a prerequisite file of the special target .PRECIOUS to preserve intermediate files created by rules whose target patterns match that file’s name.
Edit: On re-reading the question, I see that you don't want to keep the hugefiles; maybe do this:
digestA : hugefileA hugefileB
grep '^Subject:' %^ > $@
for n in $^; do echo > $$n; done
sleep 1; touch $@
It truncates the hugefiles after using them, then touches the output file a second later, just to ensure that the output is newer than the input and this rule won't run again until the empty hugefiles are removed.
Unfortunately, if only the digest is removed, then running this rule will create an empty digest. You'd probably want to add code to block that.