reading large text / csv files in Julia takes a long time compared to Python. Here are the times to read a file whose size is 486.6 MB and has 153895 rows and 644 columns. <
In my experience, the best way to deal with larger text files is not load them up into Julia, but rather to stream them. This method has some additional fixed costs, but generally runs extremely quickly. Some pseudo code is this:
function streamdat()
mycsv=open("/path/to/text.csv", "r") # <-- opens a path to your text file
sumvec = [0.0] # <-- store a sum here
i = 1
while(!eof(mycsv)) # <-- loop through each line of the file
row = readline(mycsv)
vector=split(row, "|") # <-- split each line by |
sumvec+=parse(Float64, vector[i])
i+=1
end
end
streamdat()
The code above is just a simple sum, but this logic can be expanded to more complex problems.