How to denormalize YAML for Pandas Dataframe?

和自甴很熟 提交于 2021-01-04 06:50:11

问题


I am trying to get data from a YAML file into a Pandas DataFrame. Take the following example data.yml:

---
 - doc: "Book1"
   reviews:
     - reviewer: "Paul"
       stars: "5"
     - reviewer: "Sam"
       stars: "2"
 - doc: "Book2"
   reviews:
     - reviewer: "John"
       stars: "4"
     - reviewer: "Sam"
       stars: "3"
     - reviewer: "Pete"
       stars: "2"
...

The desired DataFrame would look like this:

     doc reviews.reviewer reviews.stars
0  Book1             Paul             5
1  Book1              Sam             2
2  Book2             John             4
3  Book2              Sam             3
4  Book2             Pete             2

I've tried feeding the YAML data to Pandas different ways (like with open('data.yml') as f: data = pd.DataFrame(yaml.load(f))), but the cells always contain the nested dicts. This solution works for general JSON data, but it's quite a bit of code and it seems like a simpler solution for YAML might exist.

Is there a built-in or Pythonic way to denormalize YAML for conversion to a Pandas Dataframe in this way?


回答1:


You should use json_normalize to flatten the dictionary after YAML loads:

pd.io.json.json_normalize(yaml.load(f), 'reviews', 'doc')

  reviewer stars    doc
0     Paul     5  Book1
1      Sam     2  Book1
2     John     4  Book2
3      Sam     3  Book2
4     Pete     2  Book2


来源:https://stackoverflow.com/questions/54259207/how-to-denormalize-yaml-for-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!