问题
For my project I have to parse two big JSON files, one is 19.7 GB and another 66.3 GB. The structure of the JSON data is too complex. First Level Dictionary and again in 2nd level there might be List or Dictionary. These are all Network Log files, I have to parse those log files and do analysis. Is converting such big JSON file to CSV is advisable?
When I am trying to convert the smaller 19.7 GB JSON file to CSV file, it is having around 2000 columns and 0.5 millions of rows. I am using Pandas to parse those data. I have not touched the bigger file 66.3 GB. Whether I am going in right direction or not? When I 'll convert that bigger file, how many columns and rows will come out, there is no idea.
Kindly suggest any other good options if exists. Or is it advisable to directly read from JSON file and apply OOPs concept over it.
I have already read these articles: article 1 from Stack Overflow and article 2 from Quora
回答1:
you might want to use dask its has similar syntax to pandas only its parallel (essentially its lots of parallel pandas datafames) and lazy (this helps with avoiding ram limitations).
you could use the read_json method and then do your calculations on the dataframe
.
来源:https://stackoverflow.com/questions/51278619/what-are-the-efficient-ways-to-parse-process-huge-json-files-in-python