quickest way to to convert list of tuples to a series

前端 未结 4 1585
误落风尘
误落风尘 2021-02-19 12:59

Consider a list of tuples lst

lst = [(\'a\', 10), (\'b\', 20)]

question
What is the quickest way to

相关标签:
4条回答
  • 2021-02-19 13:37

    The simplest way is pass your list of tuples as a dictionary:

    >>> pd.Series(dict(lst))
    a   10
    b   20
    dtype: int64
    
    0 讨论(0)
  • 2021-02-19 13:42

    One approach with NumPy assuming regular length list -

    arr = np.asarray(lst)
    out = pd.Series(arr[:,1], index = arr[:,0])
    

    Sample run -

    In [147]: lst = [('a', 10), ('b', 20), ('j',1000)]
    
    In [148]: arr = np.asarray(lst)
    
    In [149]: pd.Series(arr[:,1], index = arr[:,0])
    Out[149]: 
    a      10
    b      20
    j    1000
    dtype: object
    
    0 讨论(0)
  • 2021-02-19 13:51

    use pd.Series with a dictionary comprehension

    pd.Series({k: v for k, v in lst})
    
    a    10
    b    20
    dtype: int64
    
    0 讨论(0)
  • 2021-02-19 13:55

    Two possible downsides to @Divakar's np.asarray(lst) - it converts everything to string, requiring Pandas to convert them back. And speed - making arrays is relatively expensive.

    An alternative is to use the zip(*) idiom to 'transpose' the list:

    In [65]: lst = [('a', 10), ('b', 20), ('j',1000)]
    In [66]: zlst = list(zip(*lst))
    In [67]: zlst
    Out[67]: [('a', 'b', 'j'), (10, 20, 1000)]
    In [68]: out = pd.Series(zlst[1], index = zlst[0])
    In [69]: out
    Out[69]: 
    a      10
    b      20
    j    1000
    dtype: int32
    

    Note that my dtype is int, not object.

    In [79]: out.values
    Out[79]: array(['10', '20', '1000'], dtype=object)
    

    So in the array case, Pandas doesn't convert the values back to integer; it leaves them as strings.

    ==============

    My guess about timings is off - I don't have any feel for pandas Series creation times. Also the sample is too small to do meaningful timings:

    In [71]: %%timeit
        ...: out=pd.Series(dict(lst))
    1000 loops, best of 3: 305 µs per loop
    In [72]: %%timeit
        ...: arr=np.array(lst)
        ...: out = pd.Series(arr[:,1], index=arr[:,0])
    10000 loops, best of 3: 198 µs per loop
    In [73]: %%timeit
        ...: zlst = list(zip(*lst))
        ...: out = pd.Series(zlst[1], index=zlst[0])
        ...: 
    1000 loops, best of 3: 275 µs per loop
    

    Or forcing the integer interpretation

    In [85]: %%timeit
        ...: arr=np.array(lst)
        ...: out = pd.Series(arr[:,1], index=arr[:,0], dtype=int)
        ...: 
        ...: 
    1000 loops, best of 3: 253 µs per loop
    
    0 讨论(0)
提交回复
热议问题