问题
Good Morning,
I have the following a dataframe with two columns of integers and a Series (diff) computed as:
diff = (df["col_1"] - df["col_2"]) / (df["col_2"])
I would like to create a column of the dataframe whose values are:
- equal to 0, if (diff >= 0) & (diff <= 0.35)
equal to 1, if (diff > 0.35)
equal to 2, if (diff < 0) & (diff >= - 0.35)
- equal to 3, if (diff < - 0.35)
I tried with:
df["Class"] = np.where( (diff >= 0) & (diff <= 0.35), 0,
np.where( (diff > 0.35), 1,
np.where( (diff < 0) & (diff >= - 0.35) ), 2,
np.where( ((diff < - 0.35), 3) )))
But it reports the following error:
SystemError: <built-in function where> returned a result with an error set
How can I fix it?
回答1:
You can use numpy.select to specify conditions and values separately.
s = (df['col_1'] / df['col_2']) - 1
conditions = [s.between(0, 0.35), s > 0.35, s.between(-0.35, 0), s < -0.35]
values = [0, 1, 2, 3]
df['Class'] = np.select(conditions, values, np.nan)
回答2:
One can also simply use numpy.searchsorted:
diff_classes = [-0.35,0,0.35]
def getClass(x):
return len(diff_classes)-np.searchsorted(diff_classes,x)
df["class"]=diff.apply(getClass)
searchsorted
will give you the index of x
in the diff_classes
list, which you then substract from 3 to get your desired result.
edit: A little bit less readable, but it also works in one line:
df["class"] = diff.apply(lambda x: 3-np.searchsorted([-0.35,0,0.35],x))
来源:https://stackoverflow.com/questions/51301149/numpy-where-with-more-than-2-conditions