scipy.optimize.curvefit() - array must not contain infs or NaNs

后端 未结 3 1508
孤城傲影
孤城傲影 2021-02-05 15:34

I am trying to fit some data to a curve in Python using scipy.optimize.curve_fit. I am running into the error ValueError: array must not contain infs or NaNs.

相关标签:
3条回答
  • 2021-02-05 16:20

    Your function has a negative power (x^-alpha) this is the same as (1/x)^(alpha). If x is ever 0 your function will return inf and your curve fit operation will break, I'm surprised a warning/error isn't thrown earlier informing you of a divide by 0.

    BTW why are you multiplying and dividing by 1?

    0 讨论(0)
  • 2021-02-05 16:28

    I was able to reproduce this error in python2.7 like so:

    from sklearn.decomposition import FastICA
    X = load_data.load("stuff")    #this sets X to a 2d numpy array containing 
                                   #large positive and negative numbers.
    ica = FastICA(whiten=False)
    
    print(np.isnan(X).any())   #this prints False
    print(np.isinf(X).any())   #this prints False
    
    ica.fit(X)                 #this produces the error:
    

    Which always produces the Error:

    /usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py:58: RuntimeWarning: invalid value encountered in sqrt
      return np.dot(np.dot(u * (1. / np.sqrt(s)), u.T), W)
    Traceback (most recent call last):
      File "main.py", line 43, in <module>
        ica()
      File "main.py", line 18, in ica
        ica.fit(X)
      File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 523, in fit
        self._fit(X, compute_sources=False)
      File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 479, in _fit
        compute_sources=compute_sources, return_n_iter=True)
      File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 335, in fastica
        W, n_iter = _ica_par(X1, **kwargs)
      File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 108, in _ica_par
        - g_wtx[:, np.newaxis] * W)
      File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 55, in _sym_decorrelation
        s, u = linalg.eigh(np.dot(W, W.T))
      File "/usr/lib64/python2.7/site-packages/scipy/linalg/decomp.py", line 297, in eigh
        a1 = asarray_chkfinite(a)
      File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py", line 613, in asarray_chkfinite
        "array must not contain infs or NaNs")
    ValueError: array must not contain infs or NaNs
    

    Solution:

    from sklearn.decomposition import FastICA
    X = load_data.load("stuff")    #this sets X to a 2d numpy array containing 
                                   #large positive and negative numbers.
    ica = FastICA(whiten=False)
    
    #this is a column wise normalization function which flattens the
    #two dimensional array from very large and very small numbers to 
    #reasonably sized numbers between roughly -1 and 1
    X = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
    
    print(np.isnan(X).any())   #this prints False
    print(np.isinf(X).any())   #this prints False
    
    ica.fit(X)                 #this works correctly.
    

    Why does that normalization step fix the error?

    I found the eureka moment here: sklearn's PLSRegression: "ValueError: array must not contain infs or NaNs"

    What I think is happening is that numpy is being fed gigantic numbers and very tiny numbers, and inside it's tiny brain it's creating NaN's and Inf's. So it's a bug in the sklearn. The work around is to flatten your input data to the algorithm so that there are no very large or very small numbers.

    Bad sklearn! NO biscuit!

    0 讨论(0)
  • 2021-02-05 16:29

    Why it is failing

    Not your input arrays are entailing nans or infs, but evaluation of your objective function at some X points and for some values of the parameters results in nans or infs: in other words, the array with values func(x,alpha,beta,b) for some x, alpha, beta and b is giving nans or infs over the optimization routine.

    Scipy.optimize curve fitting function uses Levenberg-Marquardt algorithm. It is also called damped least square optimization. It is an iterative procedure, and a new estimate for the optimal function parameters is computed at each iteration. Also, at some point during optimization, algorithm is exploring some region of the parameters space where your function is not defined.

    How to fix

    1/Initial guess

    Initial guess for parameters is decisive for the convergence. If initial guess is far from optimal solution, you are more likely to explore some regions where objective function is undefined. So, if you can have a better clue of what your optimal parameters are, and feed your algorithm with this initial guess, error while proceeding might be avoided.

    2/Model

    Also, you could modify your model, so that it is not returning nans. For those values of the parameters, params where original function func is not defined, you wish that objective function takes huge values, or in other words that func(params) is far from Y values to be fitted.

    Also, at points where your objective function is not defined, you may return a big float, for instance AVG(Y)*10e5 with AVG the average (so that you make sure to be much bigger than average of Y values to be fitted).

    Link

    You could have a look at this post: Fitting data to an equation in python vs gnuplot

    0 讨论(0)
提交回复
热议问题