How to apply a function to two columns of Pandas dataframe

前端 未结 12 1198
名媛妹妹
名媛妹妹 2020-11-22 06:17

Suppose I have a df which has columns of \'ID\', \'col_1\', \'col_2\'. And I define a function :

f = lambda x, y : my_function_expres

相关标签:
12条回答
  • 2020-11-22 06:38

    If you have a huge data-set, then you can use an easy but faster(execution time) way of doing this using swifter:

    import pandas as pd
    import swifter
    
    def fnc(m,x,c):
        return m*x+c
    
    df = pd.DataFrame({"m": [1,2,3,4,5,6], "c": [1,1,1,1,1,1], "x":[5,3,6,2,6,1]})
    df["y"] = df.swifter.apply(lambda x: fnc(x.m, x.x, x.c), axis=1)
    
    0 讨论(0)
  • 2020-11-22 06:42

    I suppose you don't want to change get_sublist function, and just want to use DataFrame's apply method to do the job. To get the result you want, I've wrote two help functions: get_sublist_list and unlist. As the function name suggest, first get the list of sublist, second extract that sublist from that list. Finally, We need to call apply function to apply those two functions to the df[['col_1','col_2']] DataFrame subsequently.

    import pandas as pd
    
    df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
    mylist = ['a','b','c','d','e','f']
    
    def get_sublist(sta,end):
        return mylist[sta:end+1]
    
    def get_sublist_list(cols):
        return [get_sublist(cols[0],cols[1])]
    
    def unlist(list_of_lists):
        return list_of_lists[0]
    
    df['col_3'] = df[['col_1','col_2']].apply(get_sublist_list,axis=1).apply(unlist)
    
    df
    

    If you don't use [] to enclose the get_sublist function, then the get_sublist_list function will return a plain list, it'll raise ValueError: could not broadcast input array from shape (3) into shape (2), as @Ted Petrou had mentioned.

    0 讨论(0)
  • 2020-11-22 06:44

    The way you have written f it needs two inputs. If you look at the error message it says you are not providing two inputs to f, just one. The error message is correct.
    The mismatch is because df[['col1','col2']] returns a single dataframe with two columns, not two separate columns.

    You need to change your f so that it takes a single input, keep the above data frame as input, then break it up into x,y inside the function body. Then do whatever you need and return a single value.

    You need this function signature because the syntax is .apply(f) So f needs to take the single thing = dataframe and not two things which is what your current f expects.

    Since you haven't provided the body of f I can't help in anymore detail - but this should provide the way out without fundamentally changing your code or using some other methods rather than apply

    0 讨论(0)
  • 2020-11-22 06:46

    A simple solution is:

    df['col_3'] = df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)
    
    0 讨论(0)
  • 2020-11-22 06:48

    The method you are looking for is Series.combine. However, it seems some care has to be taken around datatypes. In your example, you would (as I did when testing the answer) naively call

    df['col_3'] = df.col_1.combine(df.col_2, func=get_sublist)
    

    However, this throws the error:

    ValueError: setting an array element with a sequence.
    

    My best guess is that it seems to expect the result to be of the same type as the series calling the method (df.col_1 here). However, the following works:

    df['col_3'] = df.col_1.astype(object).combine(df.col_2, func=get_sublist)
    
    df
    
       ID   col_1   col_2   col_3
    0   1   0   1   [a, b]
    1   2   2   4   [c, d, e]
    2   3   3   5   [d, e, f]
    
    0 讨论(0)
  • 2020-11-22 06:51

    There is a clean, one-line way of doing this in Pandas:

    df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
    

    This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.

    Example with data (based on original question):

    import pandas as pd
    
    df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
    mylist = ['a', 'b', 'c', 'd', 'e', 'f']
    
    def get_sublist(sta,end):
        return mylist[sta:end+1]
    
    df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
    

    Output of print(df):

      ID  col_1  col_2      col_3
    0  1      0      1     [a, b]
    1  2      2      4  [c, d, e]
    2  3      3      5  [d, e, f]
    

    If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:

    df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)
    
    0 讨论(0)
提交回复
热议问题