Dataframe list comprehension “zip(…)”: loop through chosen df columns efficiently with just a list of column name strings

前端 未结 3 1332
不知归路
不知归路 2021-01-21 12:22

This is just a nitpicking syntactic question...

I have a dataframe, and I want to use list comprehension to evaluate a function using lots of columns.

I know I c

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-21 12:46

    df.apply() is almost as slow as df.iterrows(), both are not recommended, see How to iterate over rows in a DataFrame in Pandas --> search for "An Obvious Example" of @cs95a and see the comparison graph. As the fastest ways (vectorization, Cython routines) are not easy to implement, the 3rd best and thus usually best solution is list comprehension:

    # print 3rd col
    def some_func(row):
        print(row[2])
    
    
    df['result_col'] = [some_func(*row) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]
    

    or

    # print 3rd col
    def some_func(row):
        print(row[2])
    
    df['result_col'] = [some_func(row[0]) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]
    

    or

    # print 3rd col
    def some_func(x):
        print(x)
    
    df['result_col'] = [some_func(row[0][2]) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]
    

    See also:

    • Memory efficient way for list comprehension of pandas dataframe using multiple columns
    • list comprehension in pandas

    EDIT:

    Please use df.iloc and df.loc instead of df[[...]], see Selecting multiple columns in a pandas dataframe

提交回复
热议问题