Finding common value across multiple files containing single column values

后端未结

关注

 4  1197

I have 100 text files containing single columns each. The files are like:

file1.txt
10032
19873
18326

file2.txt
10032
19873
11254

file3.txt
15478
10032
112


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-11 22:23
              
            
            
                                                                       
awk to the rescue!

to find the common element in all files (assuming uniqueness within the same file)

awk '{a[$1]++} END{for(k in a) if(a[k]==ARGC-1) print k}' files


count all occurrences and print the values where count equals number of files.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2020-12-11 22:25
              
            
            
                                                                       
This will work whether or not the same number can appear multiple times in 1 file:

$ awk '{a[$0][ARGIND]} END{for (i in a) if (length(a[i])==ARGIND) print i}' file[123]
10032


The above uses GNU awk for true multi-dimensional arrays and ARGIND. There's easy tweaks for other awks if necessary, e.g.:

$ awk '!seen[$0,FILENAME]++{a[$0]++} END{for (i in a) if (a[i]==ARGC-1) print i}' file[123]
10032


If the numbers are unique in each file then all you need is:

$ awk '(++c[$0])==(ARGC-1)' file*
10032

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2020-12-11 22:28
              
            
            
                                                                       
One using Bash and comm because I needed to know if it would work. My test files were 1, 2 and 3, hence the for f in ?:

f=$(shuf -n1 -e ?)                     # pick one file randomly for initial comms file

sort "$f" > comms 

for f in ?                             # this time for all files
do 
  comm -1 -2 <(sort "$f") comms > tmp  # comms should be in sorted order always
  # grep -Fxf "$f" comms > tmp         # another solution, thanks @Sundeep
  mv tmp comms
done

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2020-12-11 22:38
              
            
            
                                                                       
Files with a single column?

You can sort and compare this files, using shell:

for f in file*.txt; do sort $f|uniq; done|sort|uniq -c -d


Last -c is not necessary, it's need only if you want to count number of occurences.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复