Total number of lines in a directory

后端未结

关注

 7  2015

I have a directory with thousands of files (100K for now). When I use wc -l ./*, I\'ll get:

 c1            ./test1.txt
 c2            ./tes


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2021-01-13 09:15
              
            
            
                                                                       
If what you want is the total number of lines and nothing else, then I would suggest the following command:

cat * | wc -l


This catenates the contents of all of the files in the current working directory and pipes the resulting blob of text through wc -l.

I find this to be quite elegant.  Note that the command produces no extraneous output.

UPDATE:

I didn't realize your directory contained so many files. In light of this information, you should try this command:

for file in *; do cat "$file"; done | wc -l


Most people don't know that you can pipe the output of a for loop directly into another command.

Beware that this could be very slow. If you have 100,000 or so files, my guess would be around 10 minutes.  This is a wild guess because it depends on several parameters that I'm not able to check.

If you need something faster, you should write your own utility in C. You could make it surprisingly fast if you use pthreads.

Hope that helps.

LAST NOTE: 

If you're interested in building a custom utility, I could help you code one up.  It would be a good exercise, and others might find it useful.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2021-01-13 09:22
              
            
            
                                                                       
This will give you the total count for all the files (including hidden files) in your current directory : 

$ find . -maxdepth 1 -type f  | xargs wc -l  | grep total
 1052 total


To count for files excluding hidden files use :

find . -maxdepth 1 -type f  -not -path "*/\.*"  | xargs wc -l  | grep total

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2021-01-13 09:29
              
            
            
                                                                       
awk 'END {print NR" total"}' ./*


Would be an interesting comparison to find out how many lines don’t end with a new line.

Combining the awk and Gordon’s find solutions and avoiding the “.” files.

find ./* -maxdepth 0 -type f -exec awk ‘END {print NR}’ {} +


No idea if this is better or worse but it does give a more accurate count (for me) and does not count lines in “.” files. Using ./* is just a guess that appears to work.

Still need depth and ./* requires “0” depth.

I did get the same result with  the “cat” and “awk” solutions (using the same find) since the “cat *” takes care of the new line issue. I don’t have a directory with enough files to measure time. Interesting, I’m liking the “cat” solution. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-01-13 09:29
              
            
            
                                                                       
Credit: this builds on @lifecrisis's answer, and extends it to handle large numbers of files:

find . -maxdepth 1 -type f -exec cat {} + | wc -l


find will find all of the files in the current directory, break them into groups as large as can be passed as arguments, and run cat on the groups.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-01-13 09:29
              
            
            
                                                                       
(Apologies for adding this as an answer—but I do not have enough reputation for commenting.)

A comment on @lifecrisis's answer. Perhaps cat is slowing things down a bit. We could replace cat by wc -l and then use awkto add the numbers. (This could be faster since much less data needs to go throught the pipe.) 

That is

for file in *; do wc -l "$file"; done | awk '{sum += $1} END {print sum}'


instead of 

for file in *; do cat "$file"; done | wc -l


(Disclaimer: I am not incorporating many of the improvements in other answers, but I thought the point was valid enough to write down.)

Here are my results for comparison (I ran the newer version first so that any cache effects would go against the newer candidate).

$ time for f in `seq 1 1500`; do head -c 5M </dev/urandom >myfile-$f |sed -e 's/\(................\)/\1\n/g'; done

real    0m50.360s
user    0m4.040s
sys 0m49.489s

$ time for file in myfile-*; do wc -l "$file"; done | awk '{sum += $1} END {print sum}'
30714902

real    0m3.455s
user    0m2.093s
sys 0m1.515s

$ time for file in myfile-*; do cat "$file"; done | wc -l
30714902

real    0m4.481s
user    0m2.544s
sys 0m4.312s

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2021-01-13 09:29
              
            
            
                                                                       
Below command will provide the total count of lines  from all files in path

for i in    `ls- ltr | awk ‘$1~”^-rw”{print $9}’`; do wc -l $I | awk ‘{print $1}’; done >>/var/tmp/filelinescount.txt  
Cat /var/tmp/filelinescount.txt| sed -r “s/\s+//g”|tr “\n” “+”| sed “s:+$::g”| sed ’s/^/“/g’| sed ’s/$/“/g’ | awk ‘{print “echo” “ “ $0”+bc”}’| sh

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复