logistic-regression | 易学教程

How to perform logistic regression using vowpal wabbit on very imbalanced dataset

阅读更多关于 How to perform logistic regression using vowpal wabbit on very imbalanced dataset

问题 I am trying to use vowpal wabbit for logistic regression. I am not sure if this is the right syntax to do it For training, I do ./vw -d ~/Desktop/new_data.txt --passes 20 --binary --cache_file cache.txt -f lr.vw --loss_function logistic --l1 0.05 For testing I do ./vw -d ~/libsvm-3.18_test/matlab/new_data_test.txt --binary -t -i lr.vw -p predictions.txt -r raw_score.txt Here is a snippet from my train data -1:1.00038 | 110:0.30103 262:0.90309 689:1.20412 1103:0.477121 1286:1.5563 2663:0.30103

Indexing a CSV running into inconsistent number of samples for logistic regression

阅读更多关于 Indexing a CSV running into inconsistent number of samples for logistic regression

问题 I'm currently indexing a CSV with values below and running into the error: ValueError: Found input variables with inconsistent numbers of samples: [1, 514] It's examining it as 1 row with 514 columns which emphasize that I have called a specific parameter wrong or is it due to me removing NaN's (which most of the data would default as?) "Classification","DGMLEN","IPLEN","TTL","IP" "1","0.000000","192.168.1.5","185.60.216.35","TLSv1.2" "2","0.000160","192.168.1.5","185.60.216.35","TCP" "3","0

How to predict probability in logistic regression in SAS?

阅读更多关于 How to predict probability in logistic regression in SAS?

问题 I am very new to SAS and trying to predict probabilities using logistic regression in SAS. I got the code below from SAS Support web site: data vaso; length Response $12; input Volume Rate Response @@; LogVolume=log(Volume); LogRate=log(Rate); datalines; 3.70 0.825 constrict 3.50 1.09 constrict 1.25 2.50 constrict 0.75 1.50 constrict 0.80 3.20 constrict 0.70 3.50 constrict 0.60 0.75 no_constrict 1.10 1.70 no_constrict 0.90 0.75 no_constrict 0.90 0.45 no_constrict 0.80 0.57 no_constrict 0.55 2

Unable to evaluate score using decision_function() in Logistic Regression

阅读更多关于 Unable to evaluate score using decision_function() in Logistic Regression

问题 I'm doing this Univ. Of Washington assignment where i have to predict the score of sample_test_matrix (last few lines) using decision_function() in LogisticRegression . But the error that i'm getting is ValueError: X has 145 features per sample; expecting 113092 Here is the code : import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression products = pd.read_csv('amazon_baby.csv') def remove_punct (text) : import string text = str(text) for i in string

R Logistic Regression Missing Coefficients

阅读更多关于 R Logistic Regression Missing Coefficients

问题 I am trying to asses the odds of people staying in a program given their backgrounds following these instructions. One of the variables I am looking at is age, which I split into five groups. I have run a test using the formula: mylogit15 <- glm(Stay_in_Progams ~ Age.Group + Prior_Experience, data = mydata, family = "binomial") The results of the test are clear enough, except I am missing the first and third age groups. This is what they look like: Coefficients: Estimate Std. Error z-value Pr

The tuning parameter in “glm” vs “rf”

阅读更多关于 The tuning parameter in “glm” vs “rf”

问题 I am trying to build a classification model using method = "glm" in train . When I use method = "rpart" it works fine but when I switch to method = "glm" then it gives me an error saying The tuning parameter grid should have columns parameter I tried using cpGrid = data.frame(.0001) also cpGrid = data.frame(expand.grid(.cp = seq(.0001, .09, .001))) But both throwing an error. Below is my initial code numFolds = trainControl(method = "cv", number = 10, repeats = 3) cpGrid = expand.grid(.cp =

Comparing two vectors (predicted/expected)

阅读更多关于 Comparing two vectors (predicted/expected)

问题 I am trying to do something close to a shallow bootstrapping but I am struggling with data type. Here is the script : library(languageR) data(dative) sub1<-dative[grepl("S10|S11",dative$Speaker),] mod_sub1<-glm(RealizationOfRecipient~Verb+SemanticClass+LengthOfRecipient+AnimacyOfRec+DefinOfRec+PronomOfRec+LengthOfTheme+AnimacyOfTheme+DefinOfTheme+PronomOfTheme+AccessOfRec+AccessOfTheme,family='binomial',data=sub1) comp_sub1<-dative[!grepl("S10|S11",dative$Speaker),] expected_compsub1 <- comp

find the intersection of abline with fitted curve

阅读更多关于 find the intersection of abline with fitted curve

问题 I plotted a logistic curve with its fit using the following codes: data:L50 str(L50) 'data.frame': 10 obs. of 3 variables: $ Length.Class: int 50 60 70 80 90 100 110 120 130 140 $ Total.Ind : int 9 20 18 8 4 4 1 0 1 2 $ Mature.Ind : int 0 0 6 5 3 2 1 0 1 2 plot(L50$Mature.Ind/L50$Total.Ind ~ L50$Length.Class, data=L50,pch=20,xlab="Length class(cm)",ylab="Proportion of mature individuals") glm.out<-glm(cbind(L50$Mature.Ind, L50$Total.Ind-L50$Mature.Ind) ~ L50$Length.Class,family=binomial(logit

predict.glmnet: Some Factors Have Only One Level in New Data

阅读更多关于 predict.glmnet: Some Factors Have Only One Level in New Data

问题 I've trained an elastic net model in R using glmnet and would like to use it to make predictions off of a new data set. But I'm having trouble producing the matrix to use as an argument in the predict() method because some of my factor variables (dummy variables indicating the presence of comorbidities) in the new data set only have one level (the comorbidities were never observed), which means I can't use model.matrix(RESPONSE ~ ., new_data) because it gives me the (expected) Error in

How to apply class weights in linear classifier for binary classification?

阅读更多关于 How to apply class weights in linear classifier for binary classification?

问题 This is the linear classifier that I am using to perform binary classification, here is code snippet: my_optimizer = tf.train.AdagradOptimizer(learning_rate = learning_rate) my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer,5.0) # Create a linear classifier object linear_classifier = tf.estimator.LinearClassifier( feature_columns = feature_columns, optimizer = my_optimizer ) linear_classifier.train(input_fn = training_input_fn, steps = steps) The dataset is imbalanced,