python pandas operations on columns

前端未结

关注

 4  431

北海茫月

Hi I would like to know the best way to do operations on columns in python using pandas.

I have a classical database which I have loaded as a dataframe, and I often have

相关标签:

4条回答

鱼传尺愫

2021-02-06 12:33

simplest according to me.

from random import randint, randrange, uniform
import pandas as pd
import numpy as np

df = pd.DataFrame({'a':randrange(0,10),'b':randrange(10,20),'c':np.random.randn(10)})

#If colC > 0,5, then ColC = ColB - Cola 
df['c'][df['c'] > 0.5] = df['b'] - df['a']

Tested, it works.

a   b   c
2  11 -0.576309
2  11 -0.578449
2  11 -1.085822
2  11  9.000000
2  11  9.000000
2  11 -1.081405

0 讨论(0)

余生分开走

2021-02-06 12:33

Start with..

df = pd.DataFrame({'a':randrange(1,10),'b':randrange(10,20),'c':np.random.randn(10)})
a   b   c
0   7   12  0.475248
1   7   12  -1.090855
2   7   12  -1.227489
3   7   12  0.163929

end with...

df.ix[df.A < 1,df.A = df['c'] - df['d']]; df
    a   b   c
0   7   12  5.000000
1   7   12  5.000000
2   7   12  5.000000
3   7   12  5.000000
4   7   12  1.813233

0 讨论(0)

逝去的感伤

2021-02-06 12:40

You can just use a boolean mask with either the .loc or .ix attributes of the DataFrame.

mask = df['A'] > 2
df.ix[mask, 'A'] = df.ix[mask, 'C'] - df.ix[mask, 'D']

If you have a lot of branching things then you can do:

def func(row):
    if row['A'] > 0:
        return row['B'] + row['C']
    elif row['B'] < 0:
        return row['D'] + row['A']
    else:
        return row['A']

df['A'] = df.apply(func, axis=1)

apply should generally be much faster than a for loop.

0 讨论(0)

遇见更好的自我

2021-02-06 12:41
There's lots of ways of doing this, but here's the pattern I find easiest to read.
```
#Assume df is a Panda's dataframe object
idx = df.loc[:, 'A'] > x
df.loc[idx, 'A'] = df.loc[idx, 'C'] - df.loc[idx, 'D']
```
Setting the elements less than x is as easy as df.loc[~idx, 'A'] = 0
0 讨论(0)
发布评论:

提交评论
- 加载中...