Applying my custom function to a data frame python

最后都变了- 提交于 2019-12-22 10:26:14

问题


I have a dataframe with a column called Signal. I want to add a new column to that dataframe and apply a custom function i've built. I'm very new at this and I seem to be having trouble when it comes to passing values that I'm getting out of a data frame column into a function so any help as to my syntax errors or reasoningg would be greatly appreciated!

Signal
3.98
3.78
-6.67
-17.6
-18.05
-14.48
-12.25
-13.9
-16.89
-13.3
-13.19
-18.63
-26.36
-26.23
-22.94
-23.23
-15.7

This is my simple function

def slope_test(x):
    if x >0 and x<20:
        return 'Long'
    elif x<0 and x>-20:
        return 'Short'
    else:
        return 'Flat'

I keep getting this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here is the code i've tried:

data['Position'] = data.apply(slope_test(data['Signal']))

and also:

data['Position'] = data['Signal'].apply(slope_test(data['Signal']))

回答1:


You can use numpy.select for a vectorised solution:

import numpy as np

conditions = [df['Signal'].between(0, 20, inclusive=False),
              df['Signal'].between(-20, 0, inclusive=False)]

values = ['Long', 'Short']

df['Cat'] = np.select(conditions, values, 'Flat')

Explanation

You are attempting to perform operations on a series as if it were a scalar. This won't work for the reason explained in your error. In addition, your logic for pd.Series.apply is incorrect. This method takes a function as an input. Therefore, you can simply use df['Signal'].apply(slope_test).

But pd.Series.apply is a glorified, inefficient loop. You should utilise the vectorised functionality available with NumPy arrays underlying your Pandas dataframe. In fact, this a good reason for using Pandas in the first place.




回答2:


Although your question is about apply, this will run in Python time. You could use a vectorized approach. This is the first one I thought of but I think I can improve on it:

(EDIT: No need to improve on it, I was looking for np.select which is covered in the answer by jpp so I'll leave it as-is for a demonstration of an alternative)

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [-5, 2, 15, -10, 22, -50]})
df['category'] = pd.cut(df['a'], [-20, 0, 20], 
                        labels=['short', 'long']).replace(np.NaN, 'flat')



回答3:


You simply need to use .apply() to the series of your dataframe and pass your custom function.

df.Signal.apply(slope_test)

Or, you can use lambda + apply (which is NOT recommended in this case) as below:

df.Signal.apply(lambda x: slope_test(x))

Output:

0      Long
1      Long
2     Short
3     Short
4     Short
5     Short
6     Short
7     Short
8     Short
9     Short
10    Short
11    Short
12     Flat
13     Flat
14     Flat
15     Flat
16    Short
Name: Signal, dtype: object



回答4:


Using pandas.DataFrame.apply(), this work for me:

Initialize DataFrame

import pandas as pd

d = [
    3.98, 3.78, -6.67, -17.6, -18.05, -14.48,
    -12.25, -13.9, -16.89, -13.3, -13.19, -18.63,
    -26.36, -26.23, -22.94, -23.23, -15.7]

df = pd.DataFrame(d)

Define the function you want to apply

def slope_test(x):
    if x >0 and x<20:
        return 'Long'
    elif x<0 and x>-20:
        return 'Short'
    else:
        return 'Flat'

Apply the function to the right column of your DataFrame

df[0].apply(slope_test)
>> 0      Long
1      Long
2     Short
3     Short
4     Short
5     Short
6     Short
7     Short
8     Short
9     Short
10    Short
11    Short
12     Flat
13     Flat
14     Flat
15     Flat
16    Short
Name: 0, dtype: object


来源:https://stackoverflow.com/questions/51505187/applying-my-custom-function-to-a-data-frame-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!