Accessing every 1st element of Pandas DataFrame column containing lists

后端 未结 4 1377
青春惊慌失措
青春惊慌失措 2020-11-27 06:56

I have a Pandas DataFrame with a column containing lists objects

      A
0   [1,2]
1   [3,4]
2   [8,9] 
3   [2,6]

How can I access the firs

相关标签:
4条回答
  • 2020-11-27 07:36

    As always, remember that storing non-scalar objects in frames is generally disfavoured, and should really only be used as a temporary intermediate step.

    That said, you can use the .str accessor even though it's not a column of strings:

    >>> df = pd.DataFrame({"A": [[1,2],[3,4],[8,9],[2,6]]})
    >>> df["new_col"] = df["A"].str[0]
    >>> df
            A  new_col
    0  [1, 2]        1
    1  [3, 4]        3
    2  [8, 9]        8
    3  [2, 6]        2
    >>> df["new_col"]
    0    1
    1    3
    2    8
    3    2
    Name: new_col, dtype: int64
    
    0 讨论(0)
  • 2020-11-27 08:01

    Use apply with x[0]:

    df['new_col'] = df.A.apply(lambda x: x[0])
    print df
            A  new_col
    0  [1, 2]        1
    1  [3, 4]        3
    2  [8, 9]        8
    3  [2, 6]        2
    
    0 讨论(0)
  • 2020-11-27 08:02

    You can use map and a lambda function

    df.loc[:, 'new_col'] = df.A.map(lambda x: x[0])
    

    0 讨论(0)
  • 2020-11-27 08:02

    You can just use a conditional list comprehension which takes the first value of any iterable or else uses None for that item. List comprehensions are very Pythonic.

    df['new_col'] = [val[0] if hasattr(val, '__iter__') else None for val in df["A"]]
    
    >>> df
            A  new_col
    0  [1, 2]        1
    1  [3, 4]        3
    2  [8, 9]        8
    3  [2, 6]        2
    

    Timings

    df = pd.concat([df] * 10000)
    
    %timeit df['new_col'] = [val[0] if hasattr(val, '__iter__') else None for val in df["A"]]
    100 loops, best of 3: 13.2 ms per loop
    
    %timeit df["new_col"] = df["A"].str[0]
    100 loops, best of 3: 15.3 ms per loop
    
    %timeit df['new_col'] = df.A.apply(lambda x: x[0])
    100 loops, best of 3: 12.1 ms per loop
    
    %timeit df.A.map(lambda x: x[0])
    100 loops, best of 3: 11.1 ms per loop
    

    Removing the safety check ensuring an interable.

    %timeit df['new_col'] = [val[0] for val in df["A"]]
    100 loops, best of 3: 7.38 ms per loop
    
    0 讨论(0)
提交回复
热议问题