pandas | Read json file with list/array-like fields to Boolean columns

给你一囗甜甜゛ 提交于 2021-01-27 06:40:41

问题


Here is a JSON string that contains a list of objects with each having another list embedded.

[
  {
    "name": "Alice",
    "hobbies": [
      "volleyball",
      "shopping",
      "movies"
    ]
  },
  {
    "name": "Bob",
    "hobbies": [
      "fishing",
      "movies"
    ]
  }
]

Using pandas.read_json() this turns into a DataFrame like this:

  name      hobbies
  --------------------------------------
1 Alice     [volleyball, shopping, movies]
2 Bob       [fishing, movies]

However, I would like to flatten the lists into Boolean columns like this:

  name      volleyball  shopping    movies  fishing 
  ----------------------------------------------------
1 Alice     True        True        True    False
2 Bob       False       False       True    True

I.e. when the list contains a value, the field in the corresponding column is filled with a Boolean True, otherwise with False.

I have also looked into pandas.io.json.json_normalize(), but that does not seem support this idea either. Is there any built-in way (either Python3, or pandas) to do this?

(PS. I realize that you can cook up your own code to 'normalize' the dictionary objects before loading the whole list into a DataFrame, but I might be reinventing the wheel with that and probably in a very inefficient way).


回答1:


You can use crosstab with cast to bool by astype:

df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})
print df
        hobby   name
0  volleyball  Alice
1    shopping  Alice
2      movies  Alice
3     fishing    Bob
4      movies    Bob

print pd.crosstab(df.name, df.hobby).astype(bool)

hobby fishing movies shopping volleyball
name                                    
Alice   False   True     True       True
Bob      True   True    False      False



回答2:


you can do the following:

In [56]: data = [
   ....:   {
   ....:     "name": "Alice",
   ....:     "hobbies": [
   ....:       "volleyball",
   ....:       "shopping",
   ....:       "movies"
   ....:     ]
   ....:   },
   ....:   {
   ....:     "name": "Bob",
   ....:     "hobbies": [
   ....:       "fishing",
   ....:       "movies"
   ....:     ]
   ....:   }
   ....: ]

 In [57]: df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})

In [59]: df['count'] = 1

In [60]: df
Out[60]:
        hobby   name  count
0  volleyball  Alice      1
1    shopping  Alice      1
2      movies  Alice      1
3     fishing    Bob      1
4      movies    Bob      1

In [61]: df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
Out[61]:
hobby  fishing  movies  shopping  volleyball
name
Alice      0.0     1.0       1.0         1.0
Bob        1.0     1.0       0.0         0.0

Or even better:

In [88]: r = df.pivot_table(index='name', columns='hobby', values='count').fillna(0)

In [89]: r
Out[89]:
hobby  fishing  movies  shopping  volleyball
name
Alice      0.0     1.0       1.0         1.0
Bob        1.0     1.0       0.0         0.0

let's generate list of 'boolean' columns dynamically

In [90]: cols_boolean = [c for c in r.columns.tolist() if c != 'name']

In [91]: r = r[cols_boolean].astype(bool)

In [92]: print(r)
hobby fishing movies shopping volleyball
name
Alice   False   True     True       True
Bob      True   True    False      False


来源:https://stackoverflow.com/questions/36031835/pandas-read-json-file-with-list-array-like-fields-to-boolean-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!