Pandas function: DataFrame.apply() runs top row twice [duplicate]

后端未结

关注

 3  934

一向

相关标签:

3条回答

佛祖请我去吃肉

2021-02-18 17:00

I faced the same issue today and I spend few hours on google searching for solution. Finally I come up with a work around like this:

import numpy as np
import pandas as pd
import time

def foo(text):
    text = str(text) + ' is processed'
    return text


def func1(data):
    print("run1")
    return foo(data['text'])


def func2(data):
    print("run2")
    data['text'] = data['text'] + ' is processed'
    return data


def test_one():
    data = pd.DataFrame(columns=['text'], index=np.arange(0, 3))
    data['text'] = 'text'

    start = time.time()
    data = data.apply(func1, axis = 1)
    print(time.time() - start)

    print(data)


def test_two():
    data = pd.DataFrame(columns=['text'], index=np.arange(0, 3))
    data['text'] = 'text'

    start = time.time()
    data = data.apply(func2, axis=1)
    print(time.time() - start)
    print(data)


test_one()
test_two()

if you run the program you will see the result like this:

run1
run1
run1
0.0029706954956054688
0    text is processed
1    text is processed
2    text is processed
dtype: object
run2
run2
run2
run2
0.0049877166748046875
                             text
0  text is processed is processed
1               text is processed
2               text is processed

By splitting the function (func2) into func1 and foo, it runs the first row once only.

0 讨论(0)

太阳男子

2021-02-18 17:20
I sincerely don't see any explanation on this in the provided links, but anyway: I stumbled upon the same in my code, and did the silliest thing, i.e. short-circuit the first call. But it worked.
```
is_first_call = True

def refill_uniform(row, st=600):
    nonlocal is_first_call
    if is_first_call:
        is_first_call = False
        return row
```
... here goes the code
0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2021-02-18 17:21

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. Apply is a shortcut that intelligently applies aggregate, transform or filter. You can try breaking apart your function like so to avoid the duplicate calls.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题