I am trying to fit some data to a curve in Python using scipy.optimize.curve_fit. I am running into the error ValueError: array must not contain infs or NaNs
.
Your function has a negative power (x^-alpha) this is the same as (1/x)^(alpha). If x is ever 0 your function will return inf and your curve fit operation will break, I'm surprised a warning/error isn't thrown earlier informing you of a divide by 0.
BTW why are you multiplying and dividing by 1?
I was able to reproduce this error in python2.7 like so:
from sklearn.decomposition import FastICA
X = load_data.load("stuff") #this sets X to a 2d numpy array containing
#large positive and negative numbers.
ica = FastICA(whiten=False)
print(np.isnan(X).any()) #this prints False
print(np.isinf(X).any()) #this prints False
ica.fit(X) #this produces the error:
Which always produces the Error:
/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py:58: RuntimeWarning: invalid value encountered in sqrt
return np.dot(np.dot(u * (1. / np.sqrt(s)), u.T), W)
Traceback (most recent call last):
File "main.py", line 43, in <module>
ica()
File "main.py", line 18, in ica
ica.fit(X)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 523, in fit
self._fit(X, compute_sources=False)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 479, in _fit
compute_sources=compute_sources, return_n_iter=True)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 335, in fastica
W, n_iter = _ica_par(X1, **kwargs)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 108, in _ica_par
- g_wtx[:, np.newaxis] * W)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 55, in _sym_decorrelation
s, u = linalg.eigh(np.dot(W, W.T))
File "/usr/lib64/python2.7/site-packages/scipy/linalg/decomp.py", line 297, in eigh
a1 = asarray_chkfinite(a)
File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py", line 613, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
from sklearn.decomposition import FastICA
X = load_data.load("stuff") #this sets X to a 2d numpy array containing
#large positive and negative numbers.
ica = FastICA(whiten=False)
#this is a column wise normalization function which flattens the
#two dimensional array from very large and very small numbers to
#reasonably sized numbers between roughly -1 and 1
X = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
print(np.isnan(X).any()) #this prints False
print(np.isinf(X).any()) #this prints False
ica.fit(X) #this works correctly.
I found the eureka moment here: sklearn's PLSRegression: "ValueError: array must not contain infs or NaNs"
What I think is happening is that numpy is being fed gigantic numbers and very tiny numbers, and inside it's tiny brain it's creating NaN's and Inf's. So it's a bug in the sklearn. The work around is to flatten your input data to the algorithm so that there are no very large or very small numbers.
Bad sklearn! NO biscuit!
Why it is failing
Not your input arrays are entailing nans
or infs
, but evaluation of your objective function at some X points and for some values of the parameters results in nans
or infs
: in other words, the array with values func(x,alpha,beta,b)
for some x, alpha, beta and b is giving nans
or infs
over the optimization routine.
Scipy.optimize curve fitting function uses Levenberg-Marquardt algorithm. It is also called damped least square optimization. It is an iterative procedure, and a new estimate for the optimal function parameters is computed at each iteration. Also, at some point during optimization, algorithm is exploring some region of the parameters space where your function is not defined.
How to fix
1/Initial guess
Initial guess for parameters is decisive for the convergence. If initial guess is far from optimal solution, you are more likely to explore some regions where objective function is undefined. So, if you can have a better clue of what your optimal parameters are, and feed your algorithm with this initial guess, error while proceeding might be avoided.
2/Model
Also, you could modify your model, so that it is not returning nans
. For those values of the parameters, params
where original function func
is not defined, you wish that objective function takes huge values, or in other words that func(params)
is far from Y values to be fitted.
Also, at points where your objective function is not defined, you may return a big float, for instance AVG(Y)*10e5
with AVG the average (so that you make sure to be much bigger than average of Y values to be fitted).
Link
You could have a look at this post: Fitting data to an equation in python vs gnuplot