问题
I am trying to flatten a column which is a list of lists:
var var2
0 9122532.0 [[458182615.0], [79834910.0]]
1 79834910.0 [[458182615.0], [9122532.0]]
2 458182615.0 [[79834910.0], [9122532.0]]
I want:
var var2
0 9122532.0 [458182615.0, 79834910.0]
1 79834910.0 [458182615.0, 9122532.0]
2 458182615.0 [79834910.0, 9122532.0]
Applying
sample8['var2'] = sample8['var2'].apply(chain.from_iterable).apply(list)
Gives me:
var1 var2
0 9122532.0 [[, 4, 5, 8, 1, 8, 2, 6, 1, 5, ., 0, ], [, 7, ...
1 79834910.0 [[, 4, 5, 8, 1, 8, 2, 6, 1, 5, ., 0, ], [, 9, ...
2 458182615.0 [[, 7, 9, 8, 3, 4, 9, 1, 0, ., 0, ], [, 9, 1, ...
回答1:
Data:
In [162]: df
Out[162]:
var var2
0 9122532.0 [[458182615.0], [79834910.0]]
1 79834910.0 [[458182615.0], [9122532.0]]
2 458182615.0 [[79834910.0], [9122532.0]]
Solution: use np.ravel():
In [163]: df['var2'] = df['var2'].apply(np.ravel)
In [164]: df
Out[164]:
var var2
0 9122532.0 [458182615.0, 79834910.0]
1 79834910.0 [458182615.0, 9122532.0]
2 458182615.0 [79834910.0, 9122532.0]
回答2:
Consider the dataframe df
df = pd.DataFrame(dict(
var=[9122532.0, 79834910.0, 458182615.0],
var2=[[[458182615.0], [79834910.0]],
[[458182615.0], [9122532.0]],
[[79834910.0], [9122532.0]]]
))
print(df)
var var2
0 9122532.0 [[458182615.0], [79834910.0]]
1 79834910.0 [[458182615.0], [9122532.0]]
2 458182615.0 [[79834910.0], [9122532.0]]
np.concatenate
You can apply
np.concatenate
df.assign(var2=df.var2.apply(np.concatenate))
var var2
0 9122532.0 [458182615.0, 79834910.0]
1 79834910.0 [458182615.0, 9122532.0]
2 458182615.0 [79834910.0, 9122532.0]
w/o apply
This requires that all have the same 2 x 1
shape. It can always be adapted to another shape. However, this method still requires that all shapes are consistent.
df.assign(var2=np.array(df.var2.tolist()).reshape(-1, 2).tolist())
var var2
0 9122532.0 [458182615.0, 79834910.0]
1 79834910.0 [458182615.0, 9122532.0]
2 458182615.0 [79834910.0, 9122532.0]
timing
small data
large data
来源:https://stackoverflow.com/questions/43216411/pandas-flatten-a-list-of-list-within-a-column