问题
I know this question has been asked previously here. But I couldn't find the correct answer. The answer provided in the previous post suggests the usage of Statistics.chiSqTest(data)
which provides the goodness of fit test (Pearson's Chi-Square tests), not the Wald Chi-Square tests for significance of coefficients.
I was trying to build the parameter estimate table for logistic regression in Spark. I was able to get the coefficients and intercepts, but I couldn't find the spark API to get the standard error for the coefficients. I see that the coefficient standard errors are available in the linear model as part of the model summary. But Logistic regression model summary doesn't provide this. Part of the sample code is as follows.
import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, LogisticRegression}
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training) // Assuming training is my training dataset
val trainingSummary = lrModel.summary
val binarySummary = trainingSummary.asInstanceOf[BinaryLogisticRegressionSummary] // provides the summary information of the fitted model
Is there any way of calculating the standard error for coefficients. (or getting the variance-covariance matrix for coefficients, from which we can get the standard error)
回答1:
You need to use the GLM method with Binomial+Logit instead of LogisticRegression.
https://spark.apache.org/docs/2.1.1/ml-classification-regression.html#generalized-linear-regression
来源:https://stackoverflow.com/questions/48482245/calculating-standard-error-of-coefficients-for-logistic-regression-in-spark