Pandas Flatten a list of list within a column?

问题

I am trying to flatten a column which is a list of lists:

    var         var2
0   9122532.0   [[458182615.0], [79834910.0]]
1   79834910.0  [[458182615.0], [9122532.0]]
2   458182615.0 [[79834910.0], [9122532.0]]

I want:

    var         var2
0   9122532.0   [458182615.0, 79834910.0]
1   79834910.0  [458182615.0, 9122532.0]
2   458182615.0 [79834910.0, 9122532.0]

Applying

sample8['var2'] = sample8['var2'].apply(chain.from_iterable).apply(list)

Gives me:

    var1        var2
0   9122532.0   [[, 4, 5, 8, 1, 8, 2, 6, 1, 5, ., 0, ], [, 7, ...
1   79834910.0  [[, 4, 5, 8, 1, 8, 2, 6, 1, 5, ., 0, ], [, 9, ...
2   458182615.0 [[, 7, 9, 8, 3, 4, 9, 1, 0, ., 0, ], [, 9, 1, ...

回答1:

Data:

In [162]: df
Out[162]:
           var                           var2
0    9122532.0  [[458182615.0], [79834910.0]]
1   79834910.0   [[458182615.0], [9122532.0]]
2  458182615.0    [[79834910.0], [9122532.0]]

Solution: use np.ravel():

In [163]: df['var2'] = df['var2'].apply(np.ravel)

In [164]: df
Out[164]:
           var                       var2
0    9122532.0  [458182615.0, 79834910.0]
1   79834910.0   [458182615.0, 9122532.0]
2  458182615.0    [79834910.0, 9122532.0]

回答2:

Consider the dataframe df

df = pd.DataFrame(dict(
        var=[9122532.0, 79834910.0, 458182615.0],
        var2=[[[458182615.0], [79834910.0]],
              [[458182615.0], [9122532.0]],
              [[79834910.0], [9122532.0]]]
    ))

print(df)

           var                           var2
0    9122532.0  [[458182615.0], [79834910.0]]
1   79834910.0   [[458182615.0], [9122532.0]]
2  458182615.0    [[79834910.0], [9122532.0]]

np.concatenate
You can apply np.concatenate

df.assign(var2=df.var2.apply(np.concatenate))

           var                       var2
0    9122532.0  [458182615.0, 79834910.0]
1   79834910.0   [458182615.0, 9122532.0]
2  458182615.0    [79834910.0, 9122532.0]

w/o apply
This requires that all have the same 2 x 1 shape. It can always be adapted to another shape. However, this method still requires that all shapes are consistent.

df.assign(var2=np.array(df.var2.tolist()).reshape(-1, 2).tolist())

           var                       var2
0    9122532.0  [458182615.0, 79834910.0]
1   79834910.0   [458182615.0, 9122532.0]
2  458182615.0    [79834910.0, 9122532.0]

timing
small data

large data

来源：https://stackoverflow.com/questions/43216411/pandas-flatten-a-list-of-list-within-a-column

标签

python

pandas