python pandas operations on columns

前端 未结 4 425
北海茫月
北海茫月 2021-02-06 11:52

Hi I would like to know the best way to do operations on columns in python using pandas.

I have a classical database which I have loaded as a dataframe, and I often have

相关标签:
4条回答
  • 2021-02-06 12:33

    simplest according to me.

    from random import randint, randrange, uniform
    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'a':randrange(0,10),'b':randrange(10,20),'c':np.random.randn(10)})
    
    #If colC > 0,5, then ColC = ColB - Cola 
    df['c'][df['c'] > 0.5] = df['b'] - df['a']
    

    Tested, it works.

    a   b   c
    2  11 -0.576309
    2  11 -0.578449
    2  11 -1.085822
    2  11  9.000000
    2  11  9.000000
    2  11 -1.081405
    
    0 讨论(0)
  • 2021-02-06 12:33

    Start with..

    df = pd.DataFrame({'a':randrange(1,10),'b':randrange(10,20),'c':np.random.randn(10)})
    a   b   c
    0   7   12  0.475248
    1   7   12  -1.090855
    2   7   12  -1.227489
    3   7   12  0.163929
    

    end with...

    df.ix[df.A < 1,df.A = df['c'] - df['d']]; df
        a   b   c
    0   7   12  5.000000
    1   7   12  5.000000
    2   7   12  5.000000
    3   7   12  5.000000
    4   7   12  1.813233
    
    0 讨论(0)
  • 2021-02-06 12:40

    You can just use a boolean mask with either the .loc or .ix attributes of the DataFrame.

    mask = df['A'] > 2
    df.ix[mask, 'A'] = df.ix[mask, 'C'] - df.ix[mask, 'D']
    

    If you have a lot of branching things then you can do:

    def func(row):
        if row['A'] > 0:
            return row['B'] + row['C']
        elif row['B'] < 0:
            return row['D'] + row['A']
        else:
            return row['A']
    
    df['A'] = df.apply(func, axis=1)
    

    apply should generally be much faster than a for loop.

    0 讨论(0)
  • 2021-02-06 12:41

    There's lots of ways of doing this, but here's the pattern I find easiest to read.

    #Assume df is a Panda's dataframe object
    idx = df.loc[:, 'A'] > x
    df.loc[idx, 'A'] = df.loc[idx, 'C'] - df.loc[idx, 'D']
    

    Setting the elements less than x is as easy as df.loc[~idx, 'A'] = 0

    0 讨论(0)
提交回复
热议问题