问题
I have come across a question(on SO itself) where OP has to do edit and save operation into Input_file(s) itself.
I know for a single Input_file we could do following:
awk '{print "test here..new line for saving.."}' Input_file > temp && mv temp Input_file
Now lets say we need to make changes in same kind of format of files(assume .txt here).
What I have tried/thought for this problem: Its approach is going through a for loop of .txt files and calling single awk
is a painful and NOT recommended process, since it will waste unnecessary cpu cycles and for more number of files it would be more slow.
So what possibly could be done here to perform inplace edit for multiple files with a NON GNU awk
which does not support inplace option. I have also gone through this thread Save modifications in place with awk but there is nothing much for NON GNU awk vice and changing multiple files inplace within awk
itself, since a non GNU awk will not have inplace
option to it.
NOTE: Why I am adding bash
tag since, in my answer part I have used bash commands to rename temporary files to their actual Input_file names so adding it.
EDIT: As per Ed sir's comment adding an example of samples here, though purpose of this thread's code could be used by generic purpose inplace editing too.
Sample Input_file(s):
cat test1.txt
onetwo three
tets testtest
cat test2.txt
onetwo three
tets testtest
cat test3.txt
onetwo three
tets testtest
Sample of expected output:
cat test1.txt
1
2
cat test2.txt
1
2
cat test3.txt
1
2
回答1:
Since main aim of this thread is how to do inplace SAVE in NON GNU awk
so I am posting first its template which will help anyone in any kind of requirement, they need to add/append BEGIN
and END
section in their code keeping their main BLOCK as per their requirement and it should do the inplace edit then:
NOTE: Following will write all its output to output_file, so in case you want to print anything to standard output please only add print...
statement without > (out)
in following.
Generic Template:
awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
.....your main block code.....
}
END{
if(rename){
system(rename)
}
}
' *.txt
Specific provided sample's solution:
I have come up with following approach within awk
itself (for added samples following is my approach to solve this and save output into Input_file itself)
awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
print FNR > (out)
}
END{
if(rename){
system(rename)
}
}
' *.txt
NOTE: this is only a test for saving edited output into Input_file(s) itself, one could use its BEGIN section, along with its END section in their program, main section should be as per the requirement of specific question itself.
Fair warning: Also since this approach makes a new temporary out file in path so better make sure we have enough space on systems, though at final outcome this will keep only main Input_file(s) but during operations it needs space on system/directory
Following is a test for above code.
Execution of program with an example: Lets assume following are the .txt
Input_file(s):
cat << EOF > test1.txt
onetwo three
tets testtest
EOF
cat << EOF > test2.txt
onetwo three
tets testtest
EOF
cat << EOF > test3.txt
onetwo three
tets testtest
EOF
Now when we run following code:
awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
print "new_lines_here...." > (out)
}
END{
if(rename){
system("ls -lhtr;" rename)
}
}
' *.txt
NOTE: I have place ls -lhtr
in system
section intentionally to see which output files it is creating(temporary basis) because later it will rename them into their actual name.
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test2.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test1.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test3.txt
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out2
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out1
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out0
When we do a ls -lhtr
after awk
script is done with running, we could see only .txt
files in there.
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test2.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test1.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test3.txt
Explanation: Adding a detailed explanation of above command here:
awk -v out_file="out" ' ##Starting awk program from here, creating a variable named out_file whose value SHOULD BE a name of files which are NOT present in our current directory. Basically by this name temporary files will be created which will be later renamed to actual files.
FNR==1{ ##Checking condition if this is very first line of current Input_file then do following.
close(out) ##Using close function of awk here, because we are putting output to temp files and then renaming them so making sure that we shouldn't get too many files opened error by CLOSING it.
out=out_file count++ ##Creating out variable here, whose value is value of variable out_file(defined in awk -v section) then variable count whose value will be keep increment with 1 whenever cursor comes here.
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047" ##Creating a variable named rename, whose work is to execute commands(rename ones) once we are done with processing all the Input_file(s), this will be executed in END section.
} ##Closing BLOCK for FNR==1 condition here.
{ ##Starting main BLOCK from here.
print "new_lines_here...." > (out) ##Doing printing in this example to out file.
} ##Closing main BLOCK here.
END{ ##Starting END block for this specific program here.
if(rename){ ##Checking condition if rename variable is NOT NULL then do following.
system(rename) ##Using system command and placing renme variable inside which will actually execute mv commands to rename files from out01 etc to Input_file etc.
}
} ##Closing END block of this program here.
' *.txt ##Mentioning Input_file(s) with their extensions here.
回答2:
I'd probably go with something like this if I were to try to do this:
$ cat ../tst.awk
FNR==1 { saveChanges() }
{ print FNR > new }
END { saveChanges() }
function saveChanges( bak, result, mkBackup, overwriteOrig, rmBackup) {
if ( new != "" ) {
bak = old ".bak"
mkBackup = "cp \047" old "\047 \047" bak "\047; echo \"$?\""
if ( (mkBackup | getline result) > 0 ) {
if (result == 0) {
overwriteOrig = "mv \047" new "\047 \047" old "\047; echo \"$?\""
if ( (overwriteOrig | getline result) > 0 ) {
if (result == 0) {
rmBackup = "rm -f \047" bak "\047"
system(rmBackup)
}
}
}
}
close(rmBackup)
close(overwriteOrig)
close(mkBackup)
}
old = FILENAME
new = FILENAME ".new"
}
$ awk -f ../tst.awk test1.txt test2.txt test3.txt
I'd have preferred to copy the original file to the backup first and then operate on that saving changes to the original but doing so would change the value of the FILENAME variable for every input file which is undesirable.
Note that if you had an original files named whatever.bak
or whatever.new
in your directory then you'd overwrite them with temp files so you'd need to add a test for that too. A call to mktemp
to get the temp file names would be more robust.
The FAR more useful thing to have in this situation would be a tool that executes any other command and does the "inplace" editing part since that could be used to provide "inplace" editing for POSIX sed, awk, grep, tr, whatever and wouldn't require you to change the syntax of your script to print > out
etc. every time you want to print a value. A simple, fragile, example:
$ cat inedit
#!/bin/env bash
for (( pos=$#; pos>1; pos-- )); do
if [[ -f "${!pos}" ]]; then
filesStartPos="$pos"
else
break
fi
done
files=()
cmd=()
for (( pos=1; pos<=$#; pos++)); do
arg="${!pos}"
if (( pos < filesStartPos )); then
cmd+=( "$arg" )
else
files+=( "$arg" )
fi
done
tmp=$(mktemp)
trap 'rm -f "$tmp"; exit' 0
for file in "${files[@]}"; do
"${cmd[@]}" "$file" > "$tmp" && mv -- "$tmp" "$file"
done
which you'd use as follows:
$ awk '{print FNR}' test1.txt test2.txt test3.txt
1
2
1
2
1
2
$ ./inedit awk '{print FNR}' test1.txt test2.txt test3.txt
$ tail test1.txt test2.txt test3.txt
==> test1.txt <==
1
2
==> test2.txt <==
1
2
==> test3.txt <==
1
2
One obvious problem with that inedit
script is the difficulty of identifying the input/output files separately from the command when you have multiple input files. The script above assumes all of the input files appear as a list at the end of the command and the command is run against them one at a time but of course that means you can't use it for scripts that require 2 or more files at a time, e.g.:
awk 'NR==FNR{a[$1];next} $1 in a' file1 file2
or scripts that set variables between files in the arg list, e.g.:
awk '{print $7}' FS=',' file1 FS=':' file2
Making it more robust left as an exercise for the reader but look to the xargs
synopsis as a starting point for how a robust inedit
would need to work :-).
回答3:
The shell solution is simple and likely quick enough:
for f in *.txt
do awk '...' $f > $f.tmp
mv $f.tmp $f
done
Only search for a different solution if you have conclusively demonstrated that this is too slow. Remember: premature optimization is the root of all evil.
来源:https://stackoverflow.com/questions/59243104/save-modifications-in-place-with-non-gnu-awk