Match closest value from two different files and print specific columns

后端未结

关注

 1  748

轮回少年 2021-01-28 03:31

Hi guys I have two files each of them with N columns and M rows.

File1

1 2 4 6 8
20 4 8 10 12
15 5 7 9 11

File2

1 a1 b1


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   故里飘歌
                                             
                
                
                (楼主)
            
              
              
                2021-01-28 04:04
              

            
            
                        
What I ended with trying to give a way to answer:

function closest(b,i) { # define a function
  distance=999999; # this should be higher than the max index to avoid returning null
  for (x in b) { # loop over the array to get its keys
    (x+0 > i+0) ? tmp = x - i : tmp = i - x # +0 to compare integers, ternary operator to reduce code, compute the diff between the key and the target
    if (tmp < distance) { # if the distance if less than preceding, update
      distance = tmp
      found = x # and save the key actually found closest
    }
  }
  return found  # return the closest key
}

{ # parse the files for each line (no condition)
   if (NR>FNR) { # If we changed file (File Number Record is less than Number Record) change array
     b[$1]=$0 # make an array with $1 as key
   } else {
     akeys[max++] = $1 # store the array keys to ensure order at end as for (x in array) does not guarantee the order
     a[$1]=$0 # make an array with $1 as key
   }
}

END { # Now we ended parsing the two files, print the result
  for (i in akeys) { # loop over the first file keys
    print a[akeys[i]] # print the value for this file
    if (akeys[i] in b) { # if the same key exist in second file
      print b[akeys[i]] # then print it
    } else {
      bindex = closest(b,akeys[i]) # call the function to find the closest key from second file
      print b[bindex] # print what we found
    }
  }
}


I hope this is enough commented to be clear, feel free to comment if needed.

Warning This may become really slow if you have a large number of line in the second file as the second array will be parsed for each key of first file which is not present in second file./Warning

Given your sample inputs a1 and a2:

$ mawk -f closest.awk a1 a2
1 2 4 6 8
1 a1 b1 c5 d1
20 4 8 10 12
19 a3 b4 c2 d4
15 5 7 9 11
14 a4 b5 c1 d5

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复