reading csv in Julia is slow compared to Python

后端 未结 7 1960
北荒
北荒 2020-12-16 11:10

reading large text / csv files in Julia takes a long time compared to Python. Here are the times to read a file whose size is 486.6 MB and has 153895 rows and 644 columns. <

7条回答
  •  时光说笑
    2020-12-16 11:30

    In my experience, the best way to deal with larger text files is not load them up into Julia, but rather to stream them. This method has some additional fixed costs, but generally runs extremely quickly. Some pseudo code is this:

    function streamdat()
        mycsv=open("/path/to/text.csv", "r")   # <-- opens a path to your text file
    
        sumvec = [0.0]                # <-- store a sum  here
        i = 1
        while(!eof(mycsv))            # <-- loop through each line of the file
           row = readline(mycsv) 
           vector=split(row, "|")     # <-- split each line by |
           sumvec+=parse(Float64, vector[i]) 
           i+=1
        end
    end
    
    streamdat()
    

    The code above is just a simple sum, but this logic can be expanded to more complex problems.

提交回复
热议问题