I have a 2D Numpy array that I would like to put in a pandas Series (not a DataFrame):
>>> import pandas as pd
>>> import numpy as np
>>&
pd.Series(list(a))
is consistently slower than
pd.Series(a.tolist())
tested 20,000,000 -- 500,000 rows
a = np.ones((500000,2))
showing only 1,000,000 rows:
%timeit pd.Series(list(a))
1 loop, best of 3: 301 ms per loop
%timeit pd.Series(a.tolist())
1 loop, best of 3: 261 ms per loop
Well, you can use the numpy.ndarray.tolist
function, like so:
>>> a = np.zeros((5,2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a.tolist()
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]
>>> pd.Series(a.tolist())
0 [0.0, 0.0]
1 [0.0, 0.0]
2 [0.0, 0.0]
3 [0.0, 0.0]
4 [0.0, 0.0]
dtype: object
EDIT:
A faster way to accomplish a similar result is to simply do pd.Series(list(a))
. This will make a Series of numpy arrays instead of Python lists, so should be faster than a.tolist
which returns a list of Python lists.