问题
I'm using the below function to calculate p-value to build fit logistic regression model. But I get LinAlgError: Singular matrix error.
from sklearn import linear_model
import scipy.stats as stat
class LogisticRegression_with_p_values:
def __init__(self, *args, **kwargs):
self.model = linear_model.LogisticRegression(*args, **kwargs)
def fit(self, X, y):
self.model.fit(X, y)
denom = (2.0 * (1.0 + np.cosh(self.model.decision_function(X))))
denom = np.tile(denom, (X.shape[1],1)).T
F_ij = np.dot((X/denom).T,X)
Cramer_Rao = np.linalg.inv(F_ij)
sigma_estimates = np.sqrt(np.diagonal(Cramer_Rao))
z_scores = self.model.coef_[0] / sigma_estimates
p_values = [stat.norm.sf(abs(x)) * 2 for x in z_scores]
self.coef_ = self.model.coef_
self.intercept_ = self.model.intercept_
self.p_values = p_values
reg = LogisticRegression_with_p_values()
reg.fit(inputs_train, loan_data_targets_train)
回答1:
Error : reg = LogisticRegression_with_p_values() LinAlgError: Singular matrix Fitting the Model after the P-value function throws error: LinAlgError: Singular matrix
Step 1: run the below code and observe any missing values in the green line.
corr = inputs_train.corr()
kot = corr[corr>=.9]
plt.figure(figsize=(18,10))
sns.heatmap(kot, cmap="Greens")
I am working on Lending Club analysis and encountered this error so used the above-mentioned heatmap and found a missing value in the green line so to further investigate I ran the below code to check if the output is 'nan' inputs_train['term:36'].corr(inputs_train['term:36'])
O/P:: nan
Next step :
When we create the variable 'term_int', we have not converted it to numerical from string. To verify this check output if type(df_inputs_prepr_train['term_int'][0]) gives 'STR'
so when we write :
df_inputs_prepr_train[df_inputs_prepr_train['term_int']==36]['term_int']
it shows the output as zero rows since 'term_int' is still an str 36 and not a numerical 36.
so when we use the code ::
df_inputs_prepr_train['term:36']=np.where((df_inputs_prepr_train['term_int']==36),1,0)
it basically stores only zeroes as output.
Action to be taken ::
`df_inputs_prepr_train['term_int']=pd.to_numeric(df_inputs_prepr_train['term_int'])`
Cross-verify: if you run heatmap again you won't see any missing values in the green line
来源:https://stackoverflow.com/questions/61744118/getting-linalgerror-singular-matrix-error