问题
I am using in my shell script TR command in awk to mask the data. Below example file affects only first line of the my file when i used tr command in awk. when i use the same in while loop and called the awk command inside of it then its working fine but it taking very long time to get completed. Now my requirement i want to mask many columns[example :$1, $5, $9] in the same file(file.txt) and this should affect the whole file not first line and i want to achieve this as much as faster to mask the data. Please advise
cat file.txt
========
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek,lskjsjshsh
abcbchs,degehek
abcbchs,degehek,lskjsjshsh
OUTPUT
awk -F"," -v OFS="," '{ "echo \""$1"\" | tr \"a-c\" \"e-f\" | tr \"0-5\" \"6-9\"" | getline $1 }7' file.txt
effffhs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek
abcbchs,degehek,lskjsjshsh
abcbchs,degehek
abcbchs,degehek,lskjsjshsh
Expected output
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek,lskjsjshsh
effffhs,degehek
effffhs,degehek,lskjsjshsh
回答1:
You forgot to close()
the command after every invocation. Here's the correct way to write it:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
cmd="echo '" $1 "' | tr 'a-c' 'e-f' | tr '0-5' '6-9'"
$1 = ( (cmd | getline line) > 0 ? line : $1 )
close(cmd)
print
}
$ awk -f tst.awk file
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek
effffhs,degehek,lskjsjshsh
effffhs,degehek
effffhs,degehek,lskjsjshsh
You also didn't protect yourself from getline failures, hence the extra complexity around the getline call, see http://awk.info/?tip/getline.
Given your comments, this shows how to modify multiple fields (1, 3, and 5 in this case) simultaneously:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
cmd = "echo '" $0 "' | tr 'a-c' 'e-f' | tr '0-5' '6-9'"
new = ( (cmd | getline line) > 0 ? line : $1 )
close(cmd)
split(new,tmp)
for (i in tmp) {
if (i ~ /^(1|3|5)$/) {
$i = tmp[i]
}
}
print
}
$ cat file
abc,abc,abc,abc,abc
abc,abc,abc,abc,abc,abc,abc
abc,abc,abc,abc,abc,abc
abc,abc,abc,abc
$ awk -f tst.awk file
eff,abc,eff,abc,eff
eff,abc,eff,abc,eff,abc,abc
eff,abc,eff,abc,eff,abc
eff,abc,eff,abc
To handle quotes in the input data:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
gsub(/'/,SUBSEP)
cmd = "echo '" $0 "' | tr 'a-c' 'e-f' | tr '0-5' '6-9'"
new = ( (cmd | getline line) > 0 ? line : $1 )
close(cmd)
split(new,tmp)
for (i in tmp) {
if (i ~ /^(1|3|5)$/) {
$i = tmp[i]
}
}
gsub(SUBSEP,"'")
print
}
$ cat file
a'c,abc,a"c,abc,abc
abc,a'c,abc,a"c,abc,abc,abc
abc,abc,abc,abc,abc,abc
abc,abc,abc,abc
$ awk -f tst.awk file
e'f,abc,e"f,abc,eff
eff,a'c,eff,a"c,eff,abc,abc
eff,abc,eff,abc,eff,abc
eff,abc,eff,abc
If you don't have any particular control char that's guaranteed not to appear in your input, you can create a non-existent string to use instead of SUBSEP above by using the technique described at the end of https://stackoverflow.com/a/29237745/1745001
回答2:
The code you found runs an external shell command pipeline on each input line. Like you discovered, that's an awfully inefficient way to do what you are asking. Awk isn't really an ideal choice for this task at all. Maybe try Perl.
perl -F, -lane '$F[$_] =~ tr/a-c/e-f/ =~ tr/0-5/6-9/ for (0, 4, 8); print join(",", @F)' file
The -F,
option is like with Awk, but Perl doesn't automatically split the input line. With -a
it does, splitting into an array named @F
, and with -n
it loops over all input lines. The -l
is a convenience to remove newlines from each input line and adding one back when you print.
Notice how the columns are numbered from zero, not one, like in Awk; so the indices in the for
loop access the first, fifth, and ninth elements of @F
.
来源:https://stackoverflow.com/questions/29240694/tr-command-in-awk-to-change-the-column-values