Pandas vs JSON library to read a JSON file in Python

后端 未结 1 1888
孤城傲影
孤城傲影 2020-12-19 16:24

It seems that I can use both pandas and/or json to read a json file, i.e.

import pandas as pd
pd_example = pd.read_json(\'some_json_file.json\')
相关标签:
1条回答
  • 2020-12-19 16:58

    When you have a single JSON structure inside a json file, use read_json because it loads the JSON directly into a DataFrame. With json.loads, you've to load it into a python dictionary/list, and then into a DataFrame - an unnecessary two step process.

    Of course, this is under the assumption that the structure is directly parsable into a DataFrame. For non-trivial structures (usually of the form of complex nested lists-of-dicts), you may want to use json_normalize instead.

    On the other hand, with a JSON lines file, the story becomes different. From my experience, I've found loading a JSON lines file with pd.read_json(..., lines=True) is actually slightly slower on large data (tested on ~50k+ records once), and to make matters worse, cannot handle rows with errors - the entire read operation fails. In contrast, you can use json.loads on each line of your file inside a try-except brace for some robust code which actually ends up being a few clicks faster. Go figure.

    Use whatever fits the situation.

    0 讨论(0)
提交回复
热议问题