问题
I have looked through the other answers for Operand errors and none seem to fit this example. The mathematics/equation works, either coding in X values or importing from the DataFrame. Using the same equation in an np.where expression causes the operand error.
import csv
import pandas as pd
from pandas import DataFrame
import numpy as np
data= pd.read_csv('miniDF.csv')
df=pd.DataFrame(data, columns=['X','Z'])
df['y']=df['Z']*0.01
df['y']=(14.6413819224756*(df['X']**0.5)+64.4092780704338*(np.log(df['X'])**-2)
+1675.7498523727*(np.exp(-df['X']))+3.07221083927051*np.cos(df['X']))
print(df)
df['y']=np.where(df['Z']>=(14.6413819224756*(df['X']**0.5)+64.4092780704338*(np.log(df['X'])**-2)
+1675.7498523727*(np.exp(-df['X']))+3.07221083927051*np.cos(df['X']),8,9))
print(df)
The values in my Dataframe, the output from the first print(df) and the error are as follows.
X Z y
0 1.4 1 999.999293
1 2.0 2000 380.275104
2 3.0 3 159.114194
3 4.0 4 91.481930
4 5.0 5 69.767368
5 6.0 6 63.030212
6 7.0 70 59.591631
7 8.0 8 56.422723
8 9.0 9 54.673108
9 10.0 10 55.946732
Traceback (most recent call last):
File "/Users/willhutchins/Desktop/minitest.py", line 17, in <module>
df['y']=np.where(df['Z']>=(14.6413819224756*(df['X']**0.5)+64.4092780704338*(np.log(df['X'])**-2)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/ops/__init__.py", line 1229, in wrapper
res = na_op(values, other)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/ops/__init__.py", line 1115, in na_op
result = method(y)
ValueError: operands could not be broadcast together with shapes (10,) (3,)
Why does the equation work as a stand-alone item but not work when used in np.where?
回答1:
Let:
expr = 14.6413819224756*(df['X']**0.5)+64.4092780704338*(np.log(df['X'])**-2)+1675.7498523727*(np.exp(-df['X']))+3.07221083927051*np.cos(df['X'])
then you'll find out that your code is:
df['y']=np.where(df['Z']>=(expr,8,9))
The shape
of df['Z']
is (10,)
, which means it is a one-dimensional pandas.Series
object which has 10
rows. However, (expr,8,9)
is a simple tuple
which has 3 items (expr
is indeed a 10-row pandas.Series
however).
That's why the hint is operands could not be broadcast together with shapes (10,) (3,)
, since numpy
doesn't know how to compare a 10-row pandas.Series
with a 3-item tuple
.
Check your equation again and get it modified to meet your needs.
UPDATE:
According to the comment, the 8
and 9
are two arguments to np.where(condition,x,y)
as the x
and y
. But you put them in the expr
after df['Z']>=
by mistake, which makes the >=
operator compares a pandas.Series
's with a tuple
, but not two pandas.Series
.
Just move the last parentheses and the code will work well:
df['y']=np.where(df['Z']>=(14.6413819224756*(df['X']**0.5)+64.4092780704338*(np.log(df['X'])**-2)
+1675.7498523727*(np.exp(-df['X']))+3.07221083927051*np.cos(df['X'])),8,9)
The result should be:
X Z y
0 1.4 1 9
1 2.0 2000 8
2 3.0 3 9
3 4.0 4 9
4 5.0 5 9
5 6.0 6 9
6 7.0 70 8
7 8.0 8 9
8 9.0 9 9
9 10.0 10 9
UPDATE 2:
To do np.where
while two conditions are met, or to say, an and
operation, just use np.where((condition1) & (conditions),x,y)
. For example:
df['foo']=np.where((df['Z']>3) & (df['Z']<100),True,False)
Note, the parentheses here before and after &
is necessary. You'll get this with your data:
X Z y foo
0 1.4 1 999.999293 False
1 2.0 2000 380.275104 False
2 3.0 3 159.114194 False
3 4.0 4 91.481930 True
4 5.0 5 69.767368 True
5 6.0 6 63.030212 True
6 7.0 70 59.591631 True
7 8.0 8 56.422723 True
8 9.0 9 54.673108 True
9 10.0 10 55.946732 True
来源:https://stackoverflow.com/questions/63046213/complex-curve-equation-giving-error-in-np-where-usage