logistic-regression

Roc curve and cut off point. Python

放肆的年华 提交于 2020-01-19 04:18:47
问题 I ran a logistic regression model and made predictions of the logit values. I used this to get the points on the ROC curve: from sklearn import metrics fpr, tpr, thresholds = metrics.roc_curve(Y_test,p) I know metrics.roc_auc_score gives the area under the ROC curve. Can anyone tell me what command will find the optimal cut-off point (threshold value)? 回答1: Though its late to answer, thought might be helpful. You can do this using the epi package in R (here!), however I could not find similar

how to get p value for logistic regression in spark mllib using java

拟墨画扇 提交于 2020-01-16 10:55:44
问题 How can I get p-value for logistic regression in Spark MLlib using Java. How to find the probability of the classified class. The following is the code i have tried with: SparkConf sparkConf = new SparkConf().setAppName("GRP").setMaster("local[*]"); SparkContext ctx = new SparkContext(sparkConf); LabeledPoint pos = new LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0)); String path = "dataSetnew.txt"; JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(ctx, path).toJavaRDD(); JavaRDD

Loop that will run a Logistic regression across all Independent variables and present AUC and

a 夏天 提交于 2020-01-15 12:24:06
问题 I would like to run the dependent variable of a logistic regression (in my data set it's : dat$admit ) with all available variables, each regression with its own Independent variable vs dependent variable. The outcome that I wanted to get back is a list of each regression summary : coeff,p-value ,AUC. Using the data set submitted below there should be 3 regressions. Here is a sample data set (where admit is the logistic regression dependent variable) : >dat <- read.table(text = " female

How to get the probability per instance in classifications models in spark.mllib

雨燕双飞 提交于 2020-01-09 11:56:32
问题 I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per instance. In Weka, we can get the exact probability for each instance to be of each class. How can we do it using these packages? In LogisticRegressionModel we can set the threshold. So I've created a function that check the results for each point on a

How to get the probability per instance in classifications models in spark.mllib

倾然丶 夕夏残阳落幕 提交于 2020-01-09 11:56:07
问题 I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per instance. In Weka, we can get the exact probability for each instance to be of each class. How can we do it using these packages? In LogisticRegressionModel we can set the threshold. So I've created a function that check the results for each point on a

Different Robust Standard Errors of Logit Regression in Stata and R

雨燕双飞 提交于 2020-01-09 06:36:12
问题 I am trying to replicate a logit regression from Stata to R. In Stata I use the option "robust" to have the robust standard error (heteroscedasticity-consistent standard error). I am able to replicate the exactly same coefficients from Stata, but I am not able to have the same robust standard error with the package "sandwich". I have tried some OLS linear regression examples; it seems like the sandwich estimators of R and Stata give me the same robust standard error for OLS. Does anybody know

How to correctly get the weights using spark for synthetic dataset?

六月ゝ 毕业季﹏ 提交于 2020-01-07 03:15:14
问题 I'm doing LogisticRegressionWithSGD on spark for synthetic dataset. I've calculated the error on matlab using vanilla gradient descent and on R which is ~5%. I got similar weight that was used in the model that I used to generate y. The dataset was generated using this example. While I am able to get very close error rate at the end with different stepsize tuning, the weights for individual feature isn't the same. In fact, it varies a lot. I tried LBFGS for spark and it's able to predict both

How to correctly get the weights using spark for synthetic dataset?

烈酒焚心 提交于 2020-01-07 03:14:12
问题 I'm doing LogisticRegressionWithSGD on spark for synthetic dataset. I've calculated the error on matlab using vanilla gradient descent and on R which is ~5%. I got similar weight that was used in the model that I used to generate y. The dataset was generated using this example. While I am able to get very close error rate at the end with different stepsize tuning, the weights for individual feature isn't the same. In fact, it varies a lot. I tried LBFGS for spark and it's able to predict both

Error: Please use column names for `x` when using caret() for logistic regression

折月煮酒 提交于 2020-01-06 06:53:24
问题 I'd like to build a logistic regression model using the caret package. This is my code. library(caret) df <- data.frame(response = sample(0:1, 200, replace=TRUE), predictor = rnorm(200,10,45)) outcomeName <-"response" predictors <- names(df)[!(names(df) %in% outcomeName)] index <- createDataPartition(df$response, p=0.75, list=FALSE) trainSet <- df[ index,] testSet <- df[-index,] model_glm <- train(trainSet[,outcomeName], trainSet[,predictors], method='glm', family="binomial", data = trainSet)

pyspark 2.2.0 concept behind raw predictions field of logistic regression model

时间秒杀一切 提交于 2020-01-05 04:25:31
问题 I was trying to understand the concept of the output generated from logistic regression model in Pyspark. Could anyone please explain the concept behind the rawPrediction field calculation generated from a logistic regression model? Thanks. 回答1: In older versions of the Spark javadocs (e.g. 1.5.x), there used to be the following explanation: The meaning of a "raw" prediction may vary between algorithms, but it intuitively gives a measure of confidence in each possible label (where larger =