问题
I have text files like
1.txt
AA;00000;
BB;11111;
GG;22222;
2.txt
KK;WW;55555;11111;
KK;FF;ZZ;11111;
KK;RR;YY;11111;
I generate this 3.txt output
AA;00000;
BB;11111;KK;WW;55555;FF;ZZ;RR;YY
GG;22222;
with this .awk script (I use it in Windows with cmd)
#!/usr/bin/awk -f
NR != FNR {
exit
}
{
printf "%s", $0
}
/^BB/ {
o = ""
while (getline tmp < ARGV[2]) {
n = split (tmp,arr,";")
for (i=1; i<=n; i++)
if(!match($0,arr[i]) && !match(o,arr[i]))
o=o arr[i]";"
}
printf "%s", o
}
{
print ""
}
Usage is awk -f script.awk 1.txt 2.txt
Seems to be ok but consider this situation
1.txt
AA;BB;
2.txt
CC;DD;BB;AA;
now replace in this way
AA
is replaced with d(2)
BB
is replaced with http://a.o/f/i.p?t=1
CC
is replaced with Link
DD
with A_x-y.7z
script can't generate 3.txt
AA;BB;CC;DD;
or, using replaced text it can't generate this 3.txt text output
d(2);http://a.o/f/i.p?t=1;Link;A_x-y.7z;
You can see that duplicates fields like AA
, BB
are removed from 3.txt output because script works in that way.
I suspect it has to do with the (...)
being taken as a REGEX grouping in match()
as the first parameter is a REGEX and by passing $0
and o both will be treated as "Dynamic Regular Expressions* in awk
speak
回答1:
$ cat tst.awk
BEGIN { FS=OFS=";" }
{ key = $(NF-1) }
NR == FNR {
for (i=1; i<(NF-1); i++) {
if ( !seen[key,$i]++ ) {
map[key] = (key in map ? map[key] OFS : "") $i
}
}
next
}
{ print $0 map[key] }
$ awk -f tst.awk 2.txt 1.txt
AA;00000;
BB;11111;KK;WW;55555;FF;ZZ;RR;YY
GG;22222;
The above just uses literal strings in a hash lookup of array indices so it doesn't care what characters you have in your input. If you want your input to be treated as literal strings then don't use regexp functions or operators (e.g. match()
, ~
, sub()
) on it, just use string functions/operators (e.g. index()
, ==
, substr()
, in
).
来源:https://stackoverflow.com/questions/64952864/dynamic-regular-expressions-in-awk