Count the number of lines in a file without reading entire file into memory?

前端未结

关注

 15  1335

忘掉有多难

I\'m processing huge data files (millions of lines each).

Before I start processing I\'d like to get a count of the number of lines in the file, so I can then indic

相关标签:

15条回答

天涯浪人

2020-12-24 02:35
With UNIX style text files, it's very simple
```
f = File.new("/path/to/whatever")
num_newlines = 0
while (c = f.getc) != nil
  num_newlines += 1 if c == "\n"
end
```
That's it. For MS Windows text files, you'll have to check for a sequence of "\r\n" instead of just "\n", but that's not much more difficult. For Mac OS Classic text files (as opposed to Mac OS X), you would check for "\r" instead of "\n".

So, yeah, this looks like C. So what? C's awesome and Ruby is awesome because when a C answer is easiest that's what you can expect your Ruby code to look like. Hopefully your dain hasn't already been bramaged by Java.

By the way, please don't even consider any of the answers above that use the IO#read or IO#readlines method in turn calling a String method on what's been read. You said you didn't want to read the whole file into memory and that's exactly what these do. This is why Donald Knuth recommends people understand how to program closer to the hardware because if they don't they'll end up writing "weird code". Obviously you don't want to code close to the hardware whenever you don't have to, but that should be common sense. However you should learn to recognize the instances which you do have to get closer to the nuts and bolts such as this one.

And don't try to get more "object oriented" than the situation calls for. That's an embarrassing trap for newbies who want to look more sophisticated than they really are. You should always be glad for the times when the answer really is simple, and not be disappointed when there's no complexity to give you the opportunity to write "impressive" code. However if you want to look somewhat "object oriented" and don't mind reading an entire line into memory at a time (i.e., you know the lines are short enough), you can do this
```
f = File.new("/path/to/whatever")
num_newlines = 0
f.each_line do
  num_newlines += 1
end
```
This would be a good compromise but only if the lines aren't too long in which case it might even run more quickly than my first solution.
0 讨论(0)
发布评论:

提交评论
- 加载中...

孤城傲影

2020-12-24 02:35

wc -l in Ruby with less memory, the lazy way:

(ARGV.length == 0 ?
 [["", STDIN]] :
    ARGV.lazy.map { |file_name|
        [file_name, File.open(file_name)]
})
.map { |file_name, file|
    "%8d %s\n" % [*file
                    .each_line
                    .lazy
                    .map { |line| 1 }
                    .reduce(:+), file_name]
}
.each(&:display)

as originally shown by Shugo Maeda.

Example:

$ curl -s -o wc.rb -L https://git.io/vVrQi
$ chmod u+x wc.rb
$ ./wc.rb huge_data_file.csv
  43217291 huge_data_file.csv

0 讨论(0)

攒了一身酷

2020-12-24 02:36

Reading the file a line at a time:

count = File.foreach(filename).inject(0) {|c, line| c+1}

or the Perl-ish

File.foreach(filename) {}
count = $.

count = 0
File.open(filename) {|f| count = f.read.count("\n")}

Will be slower than

count = %x{wc -l #{filename}}.split.first.to_i

0 讨论(0)

上一页 1 2 3