awk find missing number in sequence from file1 and append to column in file2

偶尔善良 提交于 2021-02-11 13:45:21

问题


hi as suggested in previous question, i will try more clarify what i want to achieve. as in file1, in column $4 i have numbers which are not continuosly sequenced like 1,2,3,4,5.. , it means i need print those missing ones e.g. after number 3 i should get number 4 and so on

cat file1

A R5 A48 1
B R5 A48 2
C R4 A48 3
D R8 A48 15
E R9 A48 22
F R20 B55 21
G R55 B22 19
R B1 I77 14
AA B8 PP 18
BX A255 PA 7
CA A77 PB 10
WW W7 PX 11

i find out partly solution in this awk one liner returning

arr=($(awk '{ print $4 }' file1 )) | printf '%s\n' ${arr[*]}| \
awk -v first=1 -v last=23 ' BEGIN {for(i=first; i<=last; i++) array[i] = 1} \
{for(i=1;i<=NF;i++) array[$i] += 1} END {for (num in array) if (array[num] == 0) print num}'
4
5
6
8
9
12
13
16
17
20
23

this is what i want it, BUT i still missing to be printed remaining numbers after 23 till number 31 and have it pasted as column $3 (number 3) based on file2 number of rows/lines

cat file2

md5sum 25d422cc23b44c3bbd7a66c76d52af46 
md5sum 25d422cc23b44c3bbd7a66c76d52af47 
md5sum 25d422cc23b44c3bbd7a66c76d52af48 
md5sum 25d422cc23b44c3bbd7a66c76d52af41 
md5sum 25d422cc23b44c3bbd7a66c76d52af22 
md5sum 25d422cc23b44c3bbd7a66c76d52af33 
md5sum 25d422cc23b44c3bbd7a66c76d52af12 
md5sum 25d422cc23b44c3bbd7a66c76d52af01 
md5sum 25d422cc23b44c3bbd7a66c76d52af55 
md5sum 25d422cc23b44c3bbd7a66c76d52af14 
md5sum 25d422cc23b44c3bbd7a66c76d52af18 
md5sum 25d422cc23b44c3bbd7a66c76d52af17 
md5sum 25d422cc23b44c3bbd7a66c76d52af77 
md5sum 25d422cc23b44c3bbd7a66c76d52af06 
md5sum 25d422cc23b44c3bbd7a66c76d52af05 
md5sum 25d422cc23b44c3bbd7a66c76d52af72 
md5sum 25d422cc23b44c3bbd7a66c76d52af73 
md5sum 25d422cc23b44c3bbd7a66c76d52af74 
md5sum 25d422cc23b44c3bbd7a66c76d52af75 
md5sum 25d422cc23b44c3bbd7a66c76d52af76 

resulting into

md5sum 25d422cc23b44c3bbd7a66c76d52af46 4
md5sum 25d422cc23b44c3bbd7a66c76d52af47 5
md5sum 25d422cc23b44c3bbd7a66c76d52af48 6
md5sum 25d422cc23b44c3bbd7a66c76d52af41 8
md5sum 25d422cc23b44c3bbd7a66c76d52af22 9
md5sum 25d422cc23b44c3bbd7a66c76d52af33 12
md5sum 25d422cc23b44c3bbd7a66c76d52af12 13
md5sum 25d422cc23b44c3bbd7a66c76d52af01 16
md5sum 25d422cc23b44c3bbd7a66c76d52af55 17
md5sum 25d422cc23b44c3bbd7a66c76d52af14 19
md5sum 25d422cc23b44c3bbd7a66c76d52af18 20
md5sum 25d422cc23b44c3bbd7a66c76d52af17 23
md5sum 25d422cc23b44c3bbd7a66c76d52af77 24
md5sum 25d422cc23b44c3bbd7a66c76d52af06 25
md5sum 25d422cc23b44c3bbd7a66c76d52af05 26
md5sum 25d422cc23b44c3bbd7a66c76d52af72 27
md5sum 25d422cc23b44c3bbd7a66c76d52af73 28
md5sum 25d422cc23b44c3bbd7a66c76d52af74 29
md5sum 25d422cc23b44c3bbd7a66c76d52af75 30
md5sum 25d422cc23b44c3bbd7a66c76d52af76 31

e.g. if if next file2 will have 22 rows/lines it will add number till 32 for example

i believe it should be done by more better way as well with putting numbers from file1 column $4 into array too and remaing logic


回答1:


awk to the rescue! No need to insert bash into the script. awk is a fully fledged programming language especially for text processing.

$ awk 'NR==FNR{a[$NF]; next} {while(++c in a); print $0, c}' file1 file2

md5sum 25d422cc23b44c3bbd7a66c76d52af46  4
md5sum 25d422cc23b44c3bbd7a66c76d52af47  5
md5sum 25d422cc23b44c3bbd7a66c76d52af48  6
md5sum 25d422cc23b44c3bbd7a66c76d52af41  8
md5sum 25d422cc23b44c3bbd7a66c76d52af22  9
md5sum 25d422cc23b44c3bbd7a66c76d52af33  12
md5sum 25d422cc23b44c3bbd7a66c76d52af12  13
md5sum 25d422cc23b44c3bbd7a66c76d52af01  16
md5sum 25d422cc23b44c3bbd7a66c76d52af55  17
md5sum 25d422cc23b44c3bbd7a66c76d52af14  20
md5sum 25d422cc23b44c3bbd7a66c76d52af18  23
md5sum 25d422cc23b44c3bbd7a66c76d52af17  24
md5sum 25d422cc23b44c3bbd7a66c76d52af77  25
md5sum 25d422cc23b44c3bbd7a66c76d52af06  26
md5sum 25d422cc23b44c3bbd7a66c76d52af05  27
md5sum 25d422cc23b44c3bbd7a66c76d52af72  28
md5sum 25d422cc23b44c3bbd7a66c76d52af73  29
md5sum 25d422cc23b44c3bbd7a66c76d52af74  30
md5sum 25d422cc23b44c3bbd7a66c76d52af75  31
md5sum 25d422cc23b44c3bbd7a66c76d52af76  32

Note that 19 is in your first file so it's skipped in the output. Your output is not consistent with your spec for the given input.



来源:https://stackoverflow.com/questions/61719638/awk-find-missing-number-in-sequence-from-file1-and-append-to-column-in-file2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!