Pandas Flatten a list of list within a column?

痞子三分冷 提交于 2021-02-19 03:46:49

问题


I am trying to flatten a column which is a list of lists:

    var         var2
0   9122532.0   [[458182615.0], [79834910.0]]
1   79834910.0  [[458182615.0], [9122532.0]]
2   458182615.0 [[79834910.0], [9122532.0]]

I want:

    var         var2
0   9122532.0   [458182615.0, 79834910.0]
1   79834910.0  [458182615.0, 9122532.0]
2   458182615.0 [79834910.0, 9122532.0]

Applying

sample8['var2'] = sample8['var2'].apply(chain.from_iterable).apply(list)

Gives me:

    var1        var2
0   9122532.0   [[, 4, 5, 8, 1, 8, 2, 6, 1, 5, ., 0, ], [, 7, ...
1   79834910.0  [[, 4, 5, 8, 1, 8, 2, 6, 1, 5, ., 0, ], [, 9, ...
2   458182615.0 [[, 7, 9, 8, 3, 4, 9, 1, 0, ., 0, ], [, 9, 1, ...

回答1:


Data:

In [162]: df
Out[162]:
           var                           var2
0    9122532.0  [[458182615.0], [79834910.0]]
1   79834910.0   [[458182615.0], [9122532.0]]
2  458182615.0    [[79834910.0], [9122532.0]]

Solution: use np.ravel():

In [163]: df['var2'] = df['var2'].apply(np.ravel)

In [164]: df
Out[164]:
           var                       var2
0    9122532.0  [458182615.0, 79834910.0]
1   79834910.0   [458182615.0, 9122532.0]
2  458182615.0    [79834910.0, 9122532.0]



回答2:


Consider the dataframe df

df = pd.DataFrame(dict(
        var=[9122532.0, 79834910.0, 458182615.0],
        var2=[[[458182615.0], [79834910.0]],
              [[458182615.0], [9122532.0]],
              [[79834910.0], [9122532.0]]]
    ))

print(df)

           var                           var2
0    9122532.0  [[458182615.0], [79834910.0]]
1   79834910.0   [[458182615.0], [9122532.0]]
2  458182615.0    [[79834910.0], [9122532.0]]

np.concatenate
You can apply np.concatenate

df.assign(var2=df.var2.apply(np.concatenate))

           var                       var2
0    9122532.0  [458182615.0, 79834910.0]
1   79834910.0   [458182615.0, 9122532.0]
2  458182615.0    [79834910.0, 9122532.0]

w/o apply
This requires that all have the same 2 x 1 shape. It can always be adapted to another shape. However, this method still requires that all shapes are consistent.

df.assign(var2=np.array(df.var2.tolist()).reshape(-1, 2).tolist())

           var                       var2
0    9122532.0  [458182615.0, 79834910.0]
1   79834910.0   [458182615.0, 9122532.0]
2  458182615.0    [79834910.0, 9122532.0]

timing
small data

large data



来源:https://stackoverflow.com/questions/43216411/pandas-flatten-a-list-of-list-within-a-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!