logistic-regression

How to perform logistic regression using vowpal wabbit on very imbalanced dataset

夙愿已清 提交于 2019-12-17 17:25:27
问题 I am trying to use vowpal wabbit for logistic regression. I am not sure if this is the right syntax to do it For training, I do ./vw -d ~/Desktop/new_data.txt --passes 20 --binary --cache_file cache.txt -f lr.vw --loss_function logistic --l1 0.05 For testing I do ./vw -d ~/libsvm-3.18_test/matlab/new_data_test.txt --binary -t -i lr.vw -p predictions.txt -r raw_score.txt Here is a snippet from my train data -1:1.00038 | 110:0.30103 262:0.90309 689:1.20412 1103:0.477121 1286:1.5563 2663:0.30103

Indexing a CSV running into inconsistent number of samples for logistic regression

南笙酒味 提交于 2019-12-14 03:27:09
问题 I'm currently indexing a CSV with values below and running into the error: ValueError: Found input variables with inconsistent numbers of samples: [1, 514] It's examining it as 1 row with 514 columns which emphasize that I have called a specific parameter wrong or is it due to me removing NaN's (which most of the data would default as?) "Classification","DGMLEN","IPLEN","TTL","IP" "1","0.000000","192.168.1.5","185.60.216.35","TLSv1.2" "2","0.000160","192.168.1.5","185.60.216.35","TCP" "3","0

How to predict probability in logistic regression in SAS?

泪湿孤枕 提交于 2019-12-14 03:15:40
问题 I am very new to SAS and trying to predict probabilities using logistic regression in SAS. I got the code below from SAS Support web site: data vaso; length Response $12; input Volume Rate Response @@; LogVolume=log(Volume); LogRate=log(Rate); datalines; 3.70 0.825 constrict 3.50 1.09 constrict 1.25 2.50 constrict 0.75 1.50 constrict 0.80 3.20 constrict 0.70 3.50 constrict 0.60 0.75 no_constrict 1.10 1.70 no_constrict 0.90 0.75 no_constrict 0.90 0.45 no_constrict 0.80 0.57 no_constrict 0.55 2

Unable to evaluate score using decision_function() in Logistic Regression

ⅰ亾dé卋堺 提交于 2019-12-14 02:36:17
问题 I'm doing this Univ. Of Washington assignment where i have to predict the score of sample_test_matrix (last few lines) using decision_function() in LogisticRegression . But the error that i'm getting is ValueError: X has 145 features per sample; expecting 113092 Here is the code : import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression products = pd.read_csv('amazon_baby.csv') def remove_punct (text) : import string text = str(text) for i in string

R Logistic Regression Missing Coefficients

早过忘川 提交于 2019-12-14 02:19:30
问题 I am trying to asses the odds of people staying in a program given their backgrounds following these instructions. One of the variables I am looking at is age, which I split into five groups. I have run a test using the formula: mylogit15 <- glm(Stay_in_Progams ~ Age.Group + Prior_Experience, data = mydata, family = "binomial") The results of the test are clear enough, except I am missing the first and third age groups. This is what they look like: Coefficients: Estimate Std. Error z-value Pr

The tuning parameter in “glm” vs “rf”

给你一囗甜甜゛ 提交于 2019-12-13 07:43:25
问题 I am trying to build a classification model using method = "glm" in train . When I use method = "rpart" it works fine but when I switch to method = "glm" then it gives me an error saying The tuning parameter grid should have columns parameter I tried using cpGrid = data.frame(.0001) also cpGrid = data.frame(expand.grid(.cp = seq(.0001, .09, .001))) But both throwing an error. Below is my initial code numFolds = trainControl(method = "cv", number = 10, repeats = 3) cpGrid = expand.grid(.cp =

Comparing two vectors (predicted/expected)

跟風遠走 提交于 2019-12-13 06:46:55
问题 I am trying to do something close to a shallow bootstrapping but I am struggling with data type. Here is the script : library(languageR) data(dative) sub1<-dative[grepl("S10|S11",dative$Speaker),] mod_sub1<-glm(RealizationOfRecipient~Verb+SemanticClass+LengthOfRecipient+AnimacyOfRec+DefinOfRec+PronomOfRec+LengthOfTheme+AnimacyOfTheme+DefinOfTheme+PronomOfTheme+AccessOfRec+AccessOfTheme,family='binomial',data=sub1) comp_sub1<-dative[!grepl("S10|S11",dative$Speaker),] expected_compsub1 <- comp

find the intersection of abline with fitted curve

折月煮酒 提交于 2019-12-13 04:39:57
问题 I plotted a logistic curve with its fit using the following codes: data:L50 str(L50) 'data.frame': 10 obs. of 3 variables: $ Length.Class: int 50 60 70 80 90 100 110 120 130 140 $ Total.Ind : int 9 20 18 8 4 4 1 0 1 2 $ Mature.Ind : int 0 0 6 5 3 2 1 0 1 2 plot(L50$Mature.Ind/L50$Total.Ind ~ L50$Length.Class, data=L50,pch=20,xlab="Length class(cm)",ylab="Proportion of mature individuals") glm.out<-glm(cbind(L50$Mature.Ind, L50$Total.Ind-L50$Mature.Ind) ~ L50$Length.Class,family=binomial(logit

predict.glmnet: Some Factors Have Only One Level in New Data

白昼怎懂夜的黑 提交于 2019-12-13 04:25:17
问题 I've trained an elastic net model in R using glmnet and would like to use it to make predictions off of a new data set. But I'm having trouble producing the matrix to use as an argument in the predict() method because some of my factor variables (dummy variables indicating the presence of comorbidities) in the new data set only have one level (the comorbidities were never observed), which means I can't use model.matrix(RESPONSE ~ ., new_data) because it gives me the (expected) Error in

How to apply class weights in linear classifier for binary classification?

拥有回忆 提交于 2019-12-13 03:09:52
问题 This is the linear classifier that I am using to perform binary classification, here is code snippet: my_optimizer = tf.train.AdagradOptimizer(learning_rate = learning_rate) my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer,5.0) # Create a linear classifier object linear_classifier = tf.estimator.LinearClassifier( feature_columns = feature_columns, optimizer = my_optimizer ) linear_classifier.train(input_fn = training_input_fn, steps = steps) The dataset is imbalanced,