问题
I have two text files
g1.txt
alfa beta;www.google.com
Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;
g2.txt
Jack to ride.zip;http://alfa.org;
JKr.rui.rar;http://gamma.org;
Nofj ogk.png;http://gamma.org;
I use this command to run my awk script
awk -f ./join2.sh g1.txt g2.txt > "g3.txt"
and I obtain this output
Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;;Jack to ride.zip;http://alfa.org;JKr.rui.rar;http://gamma.org;Nofj ogk.png;http://gamma.org;
alfa beta;www.google.com;
What are the problems?
1. row order is not conservated, for example in the output file g3.txt, the line alfa beta;www.google.com;
is after the line Light...
. when it should be first, as you can see in g1.txt
2. I have many mirror strings in Light..
line, you can see that in g3.txt
http://alfa.org
http://gamma.org
http://gamma.org
are repeated in same row.
What kind of output for rows, instead, do I want? Like this:
alfa beta;www.google.com
Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;Jack to ride.zip;JKr.rui.rar;Nofj ogk.png;
First: I try to implement a function that check if there are ugual strings inside a row, for example do you see in my row output Light Dweller - CR, Technical Metal...
that there are identical string inside that row? For example http://alfa.org
and http://gamma.org
? Ok, I don't want this. I want each string, enclosed within delimiters; is present only once and only once for each row.
This rule should only apply to the output file, g3.txt
Second: I want that original order of rows in g1.txt must be maintained in the g3.txt output file. For example, in g1.txt I have
alfa beta ...
Light Dweller ...
but my script returns to me a different ordering
Light Dweller ...
alfa beta ...
I want to prevent reordering of rows
My join2.sh script is this
#! /usr/bin/awk -f
BEGIN {
OFS=FS=";"
C=0;
}
{
if (ARGIND == 1) {
X = $NF
T0[$NF] = C++
$NF = ""
if (T1[X]) {
T1[X] = T1[X] $0
} else {
T1[X] = $0
}
} else {
X = $NF
T0[$NF] = C++
$NF = ""
if (T2[X]) {
T2[X] = T2[X] $0
} else {
T2[X] = $0
}
}
}
END {
for (X in T0) {
# concatenate T1[X] and X, since T1[X] ends with ";"
print T1[X] X, T2[X]
}
}
SOLUTION:
回答1:
You should process g2.txt
first like this:
cat join2.awk
BEGIN {
OFS=FS=";"
}
ARGIND == 1 {
map[$2] = ($2 in map ? map[$2] OFS : "") $1
next
}
{
r = $0;
for (i=1; i<=NF; ++i)
if ($i in map)
r = r OFS map[$i]
$0 = r
}
1
Then use it as:
awk -f join2.awk g2.txt g1.txt
alfa beta;www.google.com
Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;;Jack to ride.zip;JKr.rui.rar;Nofj ogk.png
来源:https://stackoverflow.com/questions/64733653/awk-preserve-row-order-and-remove-duplicate-strings-mirrors-when-generating-d