Why isn't my Pandas 'apply' function referencing multiple columns working? [closed]

前端未结

关注

 6  1540

花落未央

相关标签:

6条回答

囚心锁ツ

2020-11-22 16:19
I have given the comparison of all three discussed above.

Using values
```
%timeit df['value'] = df['a'].values % df['c'].values
```
139 µs ± 1.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Without values
```
%timeit df['value'] = df['a']%df['c'] 
```
216 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Apply function
```
%timeit df['Value'] = df.apply(lambda row: row['a']%row['c'], axis=1)
```
474 µs ± 5.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-11-22 16:26
This is same as the previous solution but I have defined the function in df.apply itself:
```
df['Value'] = df.apply(lambda row: row['a']%row['c'], axis=1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

慢半拍i

2020-11-22 16:27

Seems you forgot the '' of your string.

In [43]: df['Value'] = df.apply(lambda row: my_test(row['a'], row['c']), axis=1)

In [44]: df
Out[44]:
                    a    b         c     Value
          0 -1.674308  foo  0.343801  0.044698
          1 -2.163236  bar -2.046438 -0.116798
          2 -0.199115  foo -0.458050 -0.199115
          3  0.918646  bar -0.007185 -0.001006
          4  1.336830  foo  0.534292  0.268245
          5  0.976844  bar -0.773630 -0.570417

BTW, in my opinion, following way is more elegant:

In [53]: def my_test2(row):
....:     return row['a'] % row['c']
....:     

In [54]: df['Value'] = df.apply(my_test2, axis=1)

0 讨论(0)

春和景丽

2020-11-22 16:29
Let's say we want to apply a function add5 to columns 'a' and 'b' of DataFrame df
```
def add5(x):
    return x+5

df[['a', 'b']].apply(add5)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-11-22 16:34
All of the suggestions above work, but if you want your computations to by more efficient, you should take advantage of numpy vector operations (as pointed out here).
```
import pandas as pd
import numpy as np


df = pd.DataFrame ({'a' : np.random.randn(6),
             'b' : ['foo', 'bar'] * 3,
             'c' : np.random.randn(6)})
```
Example 1: looping with pandas.apply():
```
%%timeit
def my_test2(row):
    return row['a'] % row['c']

df['Value'] = df.apply(my_test2, axis=1)
```
The slowest run took 7.49 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 481 µs per loop

Example 2: vectorize using pandas.apply():
```
%%timeit
df['a'] % df['c']
```
The slowest run took 458.85 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 70.9 µs per loop

Example 3: vectorize using numpy arrays:
```
%%timeit
df['a'].values % df['c'].values
```
The slowest run took 7.98 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 6.39 µs per loop

So vectorizing using numpy arrays improved the speed by almost two orders of magnitude.
0 讨论(0)
发布评论:

提交评论
- 加载中...

太阳男子

2020-11-22 16:38

If you just want to compute (column a) % (column b), you don't need apply, just do it directly:

In [7]: df['a'] % df['c']                                                                                                                                                        
Out[7]: 
0   -1.132022                                                                                                                                                                    
1   -0.939493                                                                                                                                                                    
2    0.201931                                                                                                                                                                    
3    0.511374                                                                                                                                                                    
4   -0.694647                                                                                                                                                                    
5   -0.023486                                                                                                                                                                    
Name: a

0 讨论(0)

热议问题