Match closest value from two different files and print specific columns

后端 未结 1 748
轮回少年
轮回少年 2021-01-28 03:31

Hi guys I have two files each of them with N columns and M rows.

File1

1 2 4 6 8
20 4 8 10 12
15 5 7 9 11

File2

1 a1 b1         


        
1条回答
  •  故里飘歌
    2021-01-28 04:04

    What I ended with trying to give a way to answer:

    function closest(b,i) { # define a function
      distance=999999; # this should be higher than the max index to avoid returning null
      for (x in b) { # loop over the array to get its keys
        (x+0 > i+0) ? tmp = x - i : tmp = i - x # +0 to compare integers, ternary operator to reduce code, compute the diff between the key and the target
        if (tmp < distance) { # if the distance if less than preceding, update
          distance = tmp
          found = x # and save the key actually found closest
        }
      }
      return found  # return the closest key
    }
    
    { # parse the files for each line (no condition)
       if (NR>FNR) { # If we changed file (File Number Record is less than Number Record) change array
         b[$1]=$0 # make an array with $1 as key
       } else {
         akeys[max++] = $1 # store the array keys to ensure order at end as for (x in array) does not guarantee the order
         a[$1]=$0 # make an array with $1 as key
       }
    }
    
    END { # Now we ended parsing the two files, print the result
      for (i in akeys) { # loop over the first file keys
        print a[akeys[i]] # print the value for this file
        if (akeys[i] in b) { # if the same key exist in second file
          print b[akeys[i]] # then print it
        } else {
          bindex = closest(b,akeys[i]) # call the function to find the closest key from second file
          print b[bindex] # print what we found
        }
      }
    }
    

    I hope this is enough commented to be clear, feel free to comment if needed.

    Warning This may become really slow if you have a large number of line in the second file as the second array will be parsed for each key of first file which is not present in second file./Warning

    Given your sample inputs a1 and a2:

    $ mawk -f closest.awk a1 a2
    1 2 4 6 8
    1 a1 b1 c5 d1
    20 4 8 10 12
    19 a3 b4 c2 d4
    15 5 7 9 11
    14 a4 b5 c1 d5
    

    0 讨论(0)
提交回复
热议问题