logistic-regression | 易学教程

Roc curve and cut off point. Python

阅读更多关于 Roc curve and cut off point. Python

问题 I ran a logistic regression model and made predictions of the logit values. I used this to get the points on the ROC curve: from sklearn import metrics fpr, tpr, thresholds = metrics.roc_curve(Y_test,p) I know metrics.roc_auc_score gives the area under the ROC curve. Can anyone tell me what command will find the optimal cut-off point (threshold value)? 回答1: Though its late to answer, thought might be helpful. You can do this using the epi package in R (here!), however I could not find similar

how to get p value for logistic regression in spark mllib using java

阅读更多关于 how to get p value for logistic regression in spark mllib using java

问题 How can I get p-value for logistic regression in Spark MLlib using Java. How to find the probability of the classified class. The following is the code i have tried with: SparkConf sparkConf = new SparkConf().setAppName("GRP").setMaster("local[*]"); SparkContext ctx = new SparkContext(sparkConf); LabeledPoint pos = new LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0)); String path = "dataSetnew.txt"; JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(ctx, path).toJavaRDD(); JavaRDD

Loop that will run a Logistic regression across all Independent variables and present AUC and

阅读更多关于 Loop that will run a Logistic regression across all Independent variables and present AUC and

问题 I would like to run the dependent variable of a logistic regression (in my data set it's : dat$admit ) with all available variables, each regression with its own Independent variable vs dependent variable. The outcome that I wanted to get back is a list of each regression summary : coeff,p-value ,AUC. Using the data set submitted below there should be 3 regressions. Here is a sample data set (where admit is the logistic regression dependent variable) : >dat <- read.table(text = " female

How to get the probability per instance in classifications models in spark.mllib

阅读更多关于 How to get the probability per instance in classifications models in spark.mllib

问题 I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per instance. In Weka, we can get the exact probability for each instance to be of each class. How can we do it using these packages? In LogisticRegressionModel we can set the threshold. So I've created a function that check the results for each point on a

How to get the probability per instance in classifications models in spark.mllib

阅读更多关于 How to get the probability per instance in classifications models in spark.mllib

Different Robust Standard Errors of Logit Regression in Stata and R

阅读更多关于 Different Robust Standard Errors of Logit Regression in Stata and R

问题 I am trying to replicate a logit regression from Stata to R. In Stata I use the option "robust" to have the robust standard error (heteroscedasticity-consistent standard error). I am able to replicate the exactly same coefficients from Stata, but I am not able to have the same robust standard error with the package "sandwich". I have tried some OLS linear regression examples; it seems like the sandwich estimators of R and Stata give me the same robust standard error for OLS. Does anybody know

How to correctly get the weights using spark for synthetic dataset?

阅读更多关于 How to correctly get the weights using spark for synthetic dataset?

问题 I'm doing LogisticRegressionWithSGD on spark for synthetic dataset. I've calculated the error on matlab using vanilla gradient descent and on R which is ~5%. I got similar weight that was used in the model that I used to generate y. The dataset was generated using this example. While I am able to get very close error rate at the end with different stepsize tuning, the weights for individual feature isn't the same. In fact, it varies a lot. I tried LBFGS for spark and it's able to predict both

How to correctly get the weights using spark for synthetic dataset?

阅读更多关于 How to correctly get the weights using spark for synthetic dataset?

Error: Please use column names for `x` when using caret() for logistic regression

阅读更多关于 Error: Please use column names for `x` when using caret() for logistic regression

问题 I'd like to build a logistic regression model using the caret package. This is my code. library(caret) df <- data.frame(response = sample(0:1, 200, replace=TRUE), predictor = rnorm(200,10,45)) outcomeName <-"response" predictors <- names(df)[!(names(df) %in% outcomeName)] index <- createDataPartition(df$response, p=0.75, list=FALSE) trainSet <- df[ index,] testSet <- df[-index,] model_glm <- train(trainSet[,outcomeName], trainSet[,predictors], method='glm', family="binomial", data = trainSet)

pyspark 2.2.0 concept behind raw predictions field of logistic regression model

阅读更多关于 pyspark 2.2.0 concept behind raw predictions field of logistic regression model

问题 I was trying to understand the concept of the output generated from logistic regression model in Pyspark. Could anyone please explain the concept behind the rawPrediction field calculation generated from a logistic regression model? Thanks. 回答1: In older versions of the Spark javadocs (e.g. 1.5.x), there used to be the following explanation: The meaning of a "raw" prediction may vary between algorithms, but it intuitively gives a measure of confidence in each possible label (where larger =