Hi I would like to know the best way to do operations on columns in python using pandas.
I have a classical database which I have loaded as a dataframe, and I often have
You can just use a boolean mask with either the .loc
or .ix
attributes of the DataFrame.
mask = df['A'] > 2
df.ix[mask, 'A'] = df.ix[mask, 'C'] - df.ix[mask, 'D']
If you have a lot of branching things then you can do:
def func(row):
if row['A'] > 0:
return row['B'] + row['C']
elif row['B'] < 0:
return row['D'] + row['A']
else:
return row['A']
df['A'] = df.apply(func, axis=1)
apply
should generally be much faster than a for loop.