It seems that I can use both pandas and/or json to read a json file, i.e.
import pandas as pd
pd_example = pd.read_json(\'some_json_file.json\')
When you have a single JSON structure inside a json file, use read_json
because it loads the JSON directly into a DataFrame. With json.loads
, you've to load it into a python dictionary/list, and then into a DataFrame - an unnecessary two step process.
Of course, this is under the assumption that the structure is directly parsable into a DataFrame. For non-trivial structures (usually of the form of complex nested lists-of-dicts), you may want to use json_normalize
instead.
On the other hand, with a JSON lines file, the story becomes different. From my experience, I've found loading a JSON lines file with pd.read_json(..., lines=True)
is actually slightly slower on large data (tested on ~50k+ records once), and to make matters worse, cannot handle rows with errors - the entire read operation fails. In contrast, you can use json.loads
on each line of your file inside a try-except brace for some robust code which actually ends up being a few clicks faster. Go figure.
Use whatever fits the situation.