ValueError: illegal value in 4-th argument of internal None when running sklearn LinearRegression().fit()

问题

For some reason I cannot get this block of code to run properly anymore:

import numpy as np
from sklearn.linear_model import LinearRegression

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)

Traceback (most recent call last):
  File "<input>", line 2, in <module>
  File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
    linalg.lstsq(X, y)
  File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
    % (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

I'm not sure why I'm getting this error on such a simple example. Here are my current versions:

scipy.__version__
'1.5.0'
sklearn.__version__
'0.23.1'

I'm running this on 64-bit Windows 10 Enterprise and Python 3.7.3. I've tried uninstalling and reinstalling scipy and scikit-learn. I've tried earlier version of scipy. I've tried uninstalling and reinstalling Python and none of these solved the issue.

Update: So it appears to be tied to matplotlib too. I was running this example previously in Pycharm, but I've moved to running it directly from the PowerShell. So if I run this code outside of Pycharm I do not get an error

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)

However if I plot the data during it I get an error:

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Plot data
plt.scatter(x, y)
plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)

 ** On entry to DLASCLS parameter number  4 had an illegal value
Traceback (most recent call last):
  File ".\run.py", line 18, in <module>
    lm.fit(x.reshape(-1, 1), y)
  File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
    linalg.lstsq(X, y)
  File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
    % (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

But if I comment out the line plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red') it works fine.

回答1:

It seems it only happens when you print the figure using matplotlib, else you can run the fit algorithm as many times as you like.

However if you change the data type from float64 to float32 (Grzesik answer), strangely enough the error disappears. Feels like a bug to me Why would changing the data type affect the interaction between matplotlib and the lapack_function within sklearn?

More a question than an answer, but it is a bit scary to find these unexpected interactions across functions and data types.

import numpy as np
import sklearn
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


def main(print_matplotlib=False,dtype=np.float64):
    x = np.linspace(-3,3,100).astype(dtype)
    print(x.dtype)
    y = 2*np.random.rand(x.shape[0])*x + np.random.rand(x.shape[0])
    x = x.reshape((-1,1))

    reg=LinearRegression().fit(x,y)
    print(reg.intercept_,reg.coef_)
    
    yh = reg.predict(x)
    
    if print_matplotlib:
        plt.scatter(x,y)
        plt.plot(x,yh)
        plt.show()

No plotting

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = False, dtype=np.float64)
    np.random.seed(64)
    main(print_matplotlib = False, dtype=np.float64)  
    pass

float64
0.5957165420019624 [0.91960601]
float64
0.5957165420019624 [0.91960601]

Plotting dtype = np.float64

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float64)
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float64)
    pass

float64
0.5957165420019624 [0.91960601]

Plot 1

float64
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-52593a548324> in <module>
      3     main(print_matplotlib = True)
      4     np.random.seed(64)
----> 5     main(print_matplotlib = True)
      6 
      7     pass

<ipython-input-1-11139051f2d3> in main(print_matplotlib, dtype)
     11     x = x.reshape((-1,1))
     12 
---> 13     reg=LinearRegression().fit(x,y)
     14     print(reg.intercept_,reg.coef_)
     15 

~\Anaconda3\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight)
    545         else:
    546             self.coef_, self._residues, self.rank_, self.singular_ = \
--> 547                 linalg.lstsq(X, y)
    548             self.coef_ = self.coef_.T
    549 

~\AppData\Roaming\Python\Python37\site-packages\scipy\linalg\basic.py in lstsq(a, b, cond, overwrite_a, overwrite_b, check_finite, lapack_driver)
   1249         if info < 0:
   1250             raise ValueError('illegal value in %d-th argument of internal %s'
-> 1251                              % (-info, lapack_driver))
   1252         resids = np.asarray([], dtype=x.dtype)
   1253         if m > n:

ValueError: illegal value in 4-th argument of internal None

Plotting dtype=np.float32

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float32)
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float32)
    pass

Output 2

回答2:

As of numpy 1.19.1 and sklearn v0.23.2, I found that polyfit(deg=1) and LinearRegression().fit() gave unexpected errors without any good reason. No, data didn't have any NaN or Inf value. I eventually used scipy.stats.linregress().

slope, intercept, r_value, p_value, std_err = stats.linregress(x.astype(np.float32), y.astype(np.float32))

回答3:

First check for nan,inf values. and also try normalize=True

lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()

But these didn't work for me. Also, my data didn't have any nan or inf values. But while experimenting, I found that running the same code second time works. hence I did this

try: 
    lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
except:
    lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()

I don't know why this work, but this solved the problem for me. So trying to run the same code twice did the trick for me.

回答4:

You miss plt.show() in your code. Put it after this line:

plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')
plt.show()

回答5:

I would suggest you to use the parameter normalize=True in your code to avoid this.

LinearRegression(fit_intercept=True,
                 normalize=True,
                 copy_X=True,
                 n_jobs=None)

This resolved the error for me.

回答6:

In your code change it:

lm.fit(x.reshape(-1, 1), y)

on:

lm.fit(x.reshape(-1, 1).astype(np.float32), y)

回答7:

This appears to be caused by a bug in Windows (update 2004?).

The posted problem https://github.com/scipy/scipy/issues/12893
is a duplicate of https://github.com/scipy/scipy/issues/12747 and
is caused by https://github.com/numpy/numpy/issues/16744

It is related to whether Numpy can interface with a particular Basic Linear Algebra Subprograms (BLAS).

The most popular workarounds are to install Numpy using conda or to use a non-Windows (e.g. GNU/Linux OS). conda bundles the Intel Math Kernel Library (MKL) which does not have the issue. Non-Windows systems don't have Windows's problems. Supposedly Microsoft will provide a patch sometime around January 2021.

If this issue affects you, as it does many others, please remember that for Numpy, as well as Python and many other Free packages, the license clearly states,

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"

Please be mindful of that (i.e. be polite and respectful) in any comments toward the developers of these systems.

回答8:

In scipy/linalg/basic.py, there is line 1031 lstsq function. the argument lapack_driver in lstsq is set to None. line 1162 if driver is None, driver is set to 'gelsd' I think 'gelsd' is the problem. If you change driver = 'gelsy', the code is working well.

来源：https://stackoverflow.com/questions/62561902/valueerror-illegal-value-in-4-th-argument-of-internal-none-when-running-sklearn

标签

python

python-3.x

scikit-learn

scipy