logistic-regression

Java implementation of fminunc in octave

此生再无相见时 提交于 2020-01-04 18:32:11
问题 I am trying to find a java version of octave's fminunc (function minimization unconstrained) library in Java. The goal is to use it for logistic regression. Currently, I am using a home-brewed version of gradiant descent for cost minimization and I would like to be able to use an already existing library to do that (in Java) for me. This is related to my effort of porting octave code that we have from the Coursera Machine Learning course to Java. 回答1: Ahh, here are a few things you can check

How to use different scaling approaches in weka

陌路散爱 提交于 2020-01-03 04:54:11
问题 I am using logistic regression with my data in weka. Now I want to try different scaling approaches to improve my results, such as min/max, zero mean/unit, variance, length etc. Is there any option in weka for using scaling? 回答1: Weka includes methods for data preprocessing: weka.filters.unsupervised.attribute.Normalize weka.filters.unsupervised.attribute.Standardize In Java: Instances train_data = ... Instances test_data = ... Standardize filter = new Standardize(); filter.setInputFormat

Statsmodels logistic regression convergence problems

大憨熊 提交于 2020-01-02 13:37:12
问题 I'm trying to run a logistic regression in statsmodels on a large design matrix (~200 columns). The features include a number of interactions, categorical features and semi-sparse (70%) integer features. Although my design matrix is not actually ill-conditioned, it seems to be somewhat close (according to numpy.linalg.matrix_rank , it is full-rank with tol=1e-3 but not with tol=1e-2 ). As a result, I'm struggling to get logistic regression to converge with any of the methods in statsmodels.

Logistic Regression Tuning Parameter Grid in R Caret Package?

偶尔善良 提交于 2020-01-02 04:33:06
问题 I am trying to fit a logistic regression model in R using the caret package . I have done the following: model <- train(dec_var ~., data=vars, method="glm", family="binomial", trControl = ctrl, tuneGrid=expand.grid(C=c(0.001, 0.01, 0.1, 1,10,100, 1000))) However, I am unsure what the tuning parameter should be for this model and I am having a difficult time finding it. I assumed it is C because C is the parameter used in sklearn . Currently, I am getting the following error - Error: The

LC50 / LD50 confidence intervals from multiple regression glm with interaction

纵然是瞬间 提交于 2020-01-01 05:47:11
问题 I have a quasibinomial glm with two continuous explanatory variables (let's say "LogPesticide" and "LogFood") and an interaction. I would like to calculate the LC50 of the pesticide with confidence intervals at different amounts of food (e. g. the minimum and maximum food value). How can this be achieved? Example: First I generate a data set. mydata <- data.frame( LogPesticide = rep(log(c(0, 0.1, 0.2, 0.4, 0.8, 1.6) + 0.05), 4), LogFood = rep(log(c(1, 2, 4, 8)), each = 6) ) set.seed(seed=16)

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor X has new levels

爱⌒轻易说出口 提交于 2019-12-30 11:06:53
问题 I did a logistic regression: EW <- glm(everwrk~age_p + r_maritl, data = NH11, family = "binomial") Moreover, I want to predict everwrk for each level of r_maritl . r_maritl has the following levels: levels(NH11$r_maritl) "0 Under 14 years" "1 Married - spouse in household" "2 Married - spouse not in household" "3 Married - spouse in household unknown" "4 Widowed" "5 Divorced" "6 Separated" "7 Never married" "8 Living with partner" "9 Unknown marital status" So I did: predEW <- with(NH11,

Applying Cost Functions in R

会有一股神秘感。 提交于 2019-12-30 10:47:25
问题 I am in the beginning stages of machine learning in R and I find it hard to believe that there are no packages to solving the cost function for different types of regression algorithms. For example, if I want to solve the cost function for a logistic regression, the manual way would be below: https://www.r-bloggers.com/logistic-regression-with-r-step-by-step-implementation-part-2/ # Implement Sigmoid function sigmoid <- function(z) { g <- 1/(1+exp(-z)) return(g) } #Cost Function cost <-

Why did PCA reduced the performance of Logistic Regression?

牧云@^-^@ 提交于 2019-12-30 07:18:08
问题 I performed Logistic regression on a binary classification problem with data of 50000 X 370 dimensions.I got accuracy of about 90%.But when i did PCA + logistic on data, my accuracy reduced to 10%, I was very shocked to see this result. Can anybody explain what could have gone wrong? 回答1: There is no guarantee that PCA will ever help, or not harm the learning process. In particular - if you use PCA to reduce amount of dimensions - you are removing information from your data, thus everything

Detecting mulicollinear , or columns that have linear combinations while modelling in Python : LinAlgError

拜拜、爱过 提交于 2019-12-30 04:35:13
问题 I am modelling data for a logit model with 34 dependent variables,and it keep throwing in the singular matrix error , as below -: Traceback (most recent call last): File "<pyshell#1116>", line 1, in <module> test_scores = smf.Logit(m['event'], train_cols,missing='drop').fit() File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 1186, in fit disp=disp, callback=callback, **kwargs) File "/usr/local/lib/python2.7/site

Logistic regression - defining reference level in R

隐身守侯 提交于 2019-12-30 00:36:51
问题 I am going nuts trying to figure this out. How can I in R, define the reference level to use in a binary logistic regression? What about the multinomial logistic regression? Right now my code is: logistic.train.model3 <- glm(class~ x+y+z, family=binomial(link=logit), data=auth, na.action = na.exclude) my response variable is "YES" and "NO". I want to predict the probability of someone responding with "YES". I DO NOT want to recode the variable to 0 / 1. Is there a way I can tell the model to