How to read a large file - line by line?

前端未结

关注

 11  875

一整个雨季 2020-11-21 11:44

I want to iterate over each line of an entire file. One way to do this is by reading the entire file, saving it to a list, then going over the line of interest. This method

11条回答

别跟我提以往 (楼主)

2020-11-21 12:11
I would strongly recommend not using the default file loading as it is horrendously slow. You should look into the numpy functions and the IOpro functions (e.g. numpy.loadtxt()).

http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

https://store.continuum.io/cshop/iopro/

Then you can break your pairwise operation into chunks:
```
import numpy as np
import math

lines_total = n    
similarity = np.zeros(n,n)
lines_per_chunk = m
n_chunks = math.ceil(float(n)/m)
for i in xrange(n_chunks):
    for j in xrange(n_chunks):
        chunk_i = (function of your choice to read lines i*lines_per_chunk to (i+1)*lines_per_chunk)
        chunk_j = (function of your choice to read lines j*lines_per_chunk to (j+1)*lines_per_chunk)
        similarity[i*lines_per_chunk:(i+1)*lines_per_chunk,
                   j*lines_per_chunk:(j+1)*lines_per_chunk] = fast_operation(chunk_i, chunk_j) 
```
It's almost always much faster to load data in chunks and then do matrix operations on it than to do it element by element!!
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...