发表新帖

发表新帖

Pandas vs JSON library to read a JSON file in Python

后端未结

关注

 1  1887

孤城傲影 2020-12-19 16:24

It seems that I can use both pandas and/or json to read a json file, i.e.

import pandas as pd
pd_example = pd.read_json(\'some_json_file.json\')

1条回答

时光说笑 (楼主)

2020-12-19 16:58

When you have a single JSON structure inside a json file, use read_json because it loads the JSON directly into a DataFrame. With json.loads, you've to load it into a python dictionary/list, and then into a DataFrame - an unnecessary two step process.

Of course, this is under the assumption that the structure is directly parsable into a DataFrame. For non-trivial structures (usually of the form of complex nested lists-of-dicts), you may want to use json_normalize instead.

On the other hand, with a JSON lines file, the story becomes different. From my experience, I've found loading a JSON lines file with pd.read_json(..., lines=True) is actually slightly slower on large data (tested on ~50k+ records once), and to make matters worse, cannot handle rows with errors - the entire read operation fails. In contrast, you can use json.loads on each line of your file inside a try-except brace for some robust code which actually ends up being a few clicks faster. Go figure.

Use whatever fits the situation.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题