How to read a large file - line by line?

前端未结

关注

 11  889

一整个雨季 2020-11-21 11:44

I want to iterate over each line of an entire file. One way to do this is by reading the entire file, saving it to a list, then going over the line of interest. This method

11条回答

暗喜 (楼主)

2020-11-21 12:03
Katrielalex provided the way to open & read one file.

However the way your algorithm goes it reads the whole file for each line of the file. That means the overall amount of reading a file - and computing the Levenshtein distance - will be done N*N if N is the amount of lines in the file. Since you're concerned about file size and don't want to keep it in memory, I am concerned about the resulting quadratic runtime. Your algorithm is in the O(n^2) class of algorithms which often can be improved with specialization.

I suspect that you already know the tradeoff of memory versus runtime here, but maybe you would want to investigate if there's an efficient way to compute multiple Levenshtein distances in parallel. If so it would be interesting to share your solution here.

How many lines do your files have, and on what kind of machine (mem & cpu power) does your algorithm have to run, and what's the tolerated runtime?

Code would look like:
```
with f_outer as open(input_file, 'r'):
    for line_outer in f_outer:
        with f_inner as open(input_file, 'r'):
            for line_inner in f_inner:
                compute_distance(line_outer, line_inner)
```
But the questions are how do you store the distances (matrix?) and can you gain an advantage of preparing e.g. the outer_line for processing, or caching some intermediate results for reuse.
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...