logistic-regression

Calculating Standard Error of Coefficients for Logistic Regression in Spark

亡梦爱人 提交于 2019-12-22 18:19:01
问题 I know this question has been asked previously here. But I couldn't find the correct answer. The answer provided in the previous post suggests the usage of Statistics.chiSqTest(data) which provides the goodness of fit test (Pearson's Chi-Square tests), not the Wald Chi-Square tests for significance of coefficients. I was trying to build the parameter estimate table for logistic regression in Spark. I was able to get the coefficients and intercepts, but I couldn't find the spark API to get the

CNTK c# logistic regression w and b variable values

僤鯓⒐⒋嵵緔 提交于 2019-12-22 10:58:13
问题 I know CNTK for C# is kind of new but I hope someone can help me out. I was folling this logistic regression example in python: https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_101_LogisticRegression.ipynb to run this C# example: https://github.com/Microsoft/CNTK/blob/master/Examples/TrainingCSharp/Common/LogisticRegression.cs I changed a few lines to display the result and the code runs without errors but I would like to get the values of the weight matrix and bias vector so I

Why the auc is so different from logistic regression of sklearn and R

…衆ロ難τιáo~ 提交于 2019-12-22 09:47:39
问题 I use a same dataset to train logistic regression model both in R and python sklearn. The dataset is unbalanced. And I find that the auc is quite different. This is the code of python: model_logistic = linear_model.LogisticRegression() #auc 0.623 model_logistic.fit(train_x, train_y) pred_logistic = model_logistic.predict(test_x) #mean:0.0235 var:0.023 print "logistic auc: ", sklearn.metrics.roc_auc_score(test_y,pred_logistic) This is the code of R: glm_fit <- glm(label ~ watch_cnt_7 + bid_cnt

Python : How to use Multinomial Logistic Regression using SKlearn

风格不统一 提交于 2019-12-22 04:36:08
问题 I have a test dataset and train dataset as below. I have provided a sample data with min records, but my data has than 1000's of records. Here E is my target variable which I need to predict using an algorithm. It has only four categories like 1,2,3,4. It can take only any of these values. Training Dataset: A B C D E 1 20 30 1 1 2 22 12 33 2 3 45 65 77 3 12 43 55 65 4 11 25 30 1 1 22 23 19 31 2 31 41 11 70 3 1 48 23 60 4 Test Dataset: A B C D E 11 21 12 11 1 2 3 4 5 6 7 8 99 87 65 34 11 21 24

Python : How to use Multinomial Logistic Regression using SKlearn

本秂侑毒 提交于 2019-12-22 04:36:06
问题 I have a test dataset and train dataset as below. I have provided a sample data with min records, but my data has than 1000's of records. Here E is my target variable which I need to predict using an algorithm. It has only four categories like 1,2,3,4. It can take only any of these values. Training Dataset: A B C D E 1 20 30 1 1 2 22 12 33 2 3 45 65 77 3 12 43 55 65 4 11 25 30 1 1 22 23 19 31 2 31 41 11 70 3 1 48 23 60 4 Test Dataset: A B C D E 11 21 12 11 1 2 3 4 5 6 7 8 99 87 65 34 11 21 24

Python SKLearn: Logistic Regression Probabilities

橙三吉。 提交于 2019-12-21 20:32:49
问题 I am using the Python SKLearn module to perform logistic regression. I have a dependent variable vector Y (taking values from 1 of M classes) and independent variable matrix X (with N features). My code is LR = LogisticRegression() LR.fit(X,np.resize(Y,(len(Y)))) My question is, what does LR.coef_ and LR.intercept_ represent. I initially thought they held the values intercept(i) and coef(i,j) s.t. log(p(1)/(1-p(1))) = intercept(1) + coef(1,1)*X1 + ... coef(1,N)*XN . . . log(p(M)/(1-p(M))) =

Logistic regression returns error but runs okay on reduced dataset

浪尽此生 提交于 2019-12-21 20:17:12
问题 I would appreciate your input on this a lot! I am working on a logistic regression, but it is not working for some reason: mod1<-glm(survive~reLDM2+yr+yr2+reLDM2:yr +reLDM2:yr2+NestAge0, family=binomial(link=logexp(NSSH1$exposure)), data=NSSH1, control = list(maxit = 50)) When I run the same model with less data it works! But with the complete dataset I get an error and warning messages: Error: inner loop 1; cannot correct step size In addition: Warning messages: 1: step size truncated due to

Avoiding numerical overflow when calculating the value AND gradient of the Logistic loss function

╄→гoц情女王★ 提交于 2019-12-21 16:53:31
问题 I am currently trying to implement a machine learning algorithm that involves the logistic loss function in MATLAB. Unfortunately, I am having some trouble due to numerical overflow. In general, for a given an input s , the value of the logistic function is: log(1 + exp(s)) and the slope of the logistic loss function is: exp(s)./(1 + exp(s)) = 1./(1 + exp(-s)) In my algorithm, the value of s = X*beta . Here X is a matrix with N data points and P features per data point (i.e. size(X)=[N,P] )

How can I get the relative importance of features of a logistic regression for a particular prediction?

放肆的年华 提交于 2019-12-21 05:11:27
问题 I am using a Logistic Regression (in scikit) for a binary classification problem, and am interested in being able to explain each individual prediction. To be more precise, I'm interested in predicting the probability of the positive class, and having a measure of the importance of each feature for that prediction. Using the coefficients (Betas) as a measure of importance is generally a bad idea as answered here, but I'm yet to find a good alternative. So far the best I have found are the

What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib?

£可爱£侵袭症+ 提交于 2019-12-20 12:23:37
问题 After I trained a LogisticRegressionModel, I transformed the test data DF with it and get the prediction DF. And then when I call prediction.show(), the output column names are: [label | features | rawPrediction | probability | prediction] . I know what label and featrues mean, but how should I understand rawPrediction|probability|prediction ? 回答1: RawPrediction is typically the direct probability/confidence calculation. From Spark docs: Raw prediction for each possible label. The meaning of