reading csv in Julia is slow compared to Python

后端未结

关注

 7  1960

北荒 2020-12-16 11:10

reading large text / csv files in Julia takes a long time compared to Python. Here are the times to read a file whose size is 486.6 MB and has 153895 rows and 644 columns. <

7条回答

时光说笑 (楼主)

2020-12-16 11:30
In my experience, the best way to deal with larger text files is not load them up into Julia, but rather to stream them. This method has some additional fixed costs, but generally runs extremely quickly. Some pseudo code is this:
```
function streamdat()
    mycsv=open("/path/to/text.csv", "r")   # <-- opens a path to your text file

    sumvec = [0.0]                # <-- store a sum  here
    i = 1
    while(!eof(mycsv))            # <-- loop through each line of the file
       row = readline(mycsv) 
       vector=split(row, "|")     # <-- split each line by |
       sumvec+=parse(Float64, vector[i]) 
       i+=1
    end
end

streamdat()
```
The code above is just a simple sum, but this logic can be expanded to more complex problems.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...