How to convert a column to a non-nested list while the column elements are list?
For example, the column is like
column
[1, 2, 3]
[1, 2]
<
Another solution that will work is the list.extend()
method.
list = []
for row in column:
list.extend(row)
We concatenate lists with the +
operator. Because a pandas series uses its' elements underlying +
operation when you call pd.Series.sum
, we can concatenate a whole column, or series, of lists with.
df.column.sum()
[1, 2, 3, 1, 2]
But if you're looking for performance, you can consider cytoolz.concat
import cytoolz
list(cytoolz.concat(df.column.values.tolist()))
[1, 2, 3, 1, 2]
You can use append method of list to do this:
col = {'col': [[1, 2, 3], [1, 2]]}
last = []
last.extend([i for c in col['col'] for i in c])
You can use numpy.concatenate:
print (np.concatenate(df['column'].values).tolist())
[1, 2, 3, 1, 2]
Or:
from itertools import chain
print (list(chain.from_iterable(df['column'])))
[1, 2, 3, 1, 2]
Another solution, thanks juanpa.arrivillaga:
print ([item for sublist in df['column'] for item in sublist])
[1, 2, 3, 1, 2]
Timings:
df = pd.DataFrame({'column':[[1,2,3], [1,2]]})
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)
In [77]: %timeit (np.concatenate(df['column'].values).tolist())
10 loops, best of 3: 22.7 ms per loop
In [78]: %timeit (list(chain.from_iterable(df['column'])))
1000 loops, best of 3: 1.44 ms per loop
In [79]: %timeit ([item for sublist in df['column'] for item in sublist])
100 loops, best of 3: 2.31 ms per loop
In [80]: %timeit df.column.sum()
1 loop, best of 3: 1.34 s per loop