logistic-regression

how to model all relationships between independent variables in R?

末鹿安然 提交于 2019-12-24 18:31:34
问题 I have a small data set with 4 independent (call them a, b, c, d) and 1 dependent variables. Since there are few independent variables, I want to explore all combinations of these variables. There can be only 14 models (a, b, c, d, a+b, a+c, a+d, b+c, b+d, c+d, a+b+c, a+b+d, b+c+d, a+b+c+d). I build all models by hand and it is time-consuming. Therefore I want to automatize it. Is it possible in R? glm(dep ~ a, family = "binomial", data = data) glm(dep ~ b + c, family = "binomial", data =

drc: Error in drm when used interaction terms

流过昼夜 提交于 2019-12-24 17:11:48
问题 I want to fit log-logistic regression for the following data in drc R package for the combination of Temp and Variety. However, my code throws the following error Error in Temp:Variety : NA/NaN argument Code: df2 <- structure(list(Temp = c(15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 20L, 20L, 20L, 20L, 20L, 20L, 25L, 25L, 25L, 25L, 30L, 30L, 30L, 30L, 35L, 35L, 35L, 35L, 40L, 40L, 40L, 40L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 20L, 20L, 20L, 20L, 20L, 20L, 25L, 25L, 25L, 25L, 30L,

Comparing MSE loss and cross-entropy loss in terms of convergence

半世苍凉 提交于 2019-12-24 10:38:40
问题 For a very simple classification problem where I have a target vector [0,0,0,....0] and a prediction vector [0,0.1,0.2,....1] would cross-entropy loss converge better/faster or would MSE loss? When I plot them it seems to me that MSE loss has a lower error margin. Why would that be? Or for example when I have the target as [1,1,1,1....1] I get the following: 回答1: You sound a little confused... Comparing the values of MSE & cross-entropy loss and saying that one is lower than the other is like

Elegantly convert rate summary rows into long binary-response rows?

孤人 提交于 2019-12-24 06:46:08
问题 Background: I am running a little A/B test, with 2x2 factors (foreground's black and background's white, off-color vs normal color), and Analytics reports the number of hits for each of the 4 conditions and at what rate they 'converted' (a binary variable, which I define as spending at least 40 seconds on page). It's easy enough to do a little editing and get in a nice R dataframe: rates <- read.csv(stdin(),header=TRUE) Black,White,N,Rate TRUE,FALSE,512,0.2344 FALSE,TRUE,529,0.2098 TRUE,TRUE

Probability of predictions using Spark LogisticRegressionWithLBFGS for multiclass classification

落爺英雄遲暮 提交于 2019-12-24 00:45:43
问题 I am using LogisticRegressionWithLBFGS() to train a model with multiple classes. From the documentation in mllib it is written that clearThreshold() can be used only if the classification is binary. Is there a way to use something similar for multiclass classification in order to output the probabilities of each class in a given input in the model? 回答1: There are two ways to accomplish this. One is to create a method that assumes the responsibility of predictPoint in LogisticRegression.scala

Array like input for Sklearn LogisticRegressionCV

社会主义新天地 提交于 2019-12-24 00:26:23
问题 Originally, I read the data from a .csv file, but here I build the dataframe from lists so the problem can be reproduced. The aim is to train a logistic regression model with cross-validation using LogisticRegressionCV . indeps = ['M', 'F', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'F', 'F', 'F', 'F', 'F', 'M', 'F', 'F', 'F', 'F', 'F', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'F', 'M', 'F', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F', 'M', 'M', 'M', 'F', 'M', 'M', 'M', 'F', 'M', 'M', 'F', 'F'] dep = [1.0, 1.0,

How to find beta values in Logistic Regression model with sklearn

若如初见. 提交于 2019-12-23 16:43:45
问题 Based on the Logistic Regression function: I'm trying to extract the following values from my model in scikit-learn . and Where is the intercept and is the regression coefficient. (as per the wikipedia) Now, I think I can get by doing model.intercept_ but I've been struggling to get . Any ideas? 回答1: You can access the coefficient of the features using model.coef_ . It gives a list of values that corresponds to the values beta1 , beta2 and so on. The size of the list depends on the amount of

Unable to run logistic regression due to “perfect separation error”

我与影子孤独终老i 提交于 2019-12-23 08:49:27
问题 I'm a beginner to data analysis in Python and have been having trouble with this particular assignment. I've searched quite widely, but have not been able to identify what's wrong. I imported a file and set it up as a dataframe. Cleaned the data within the file. However, when I try to fit my model to the data, I get a Perfect separation detected, results not available Here is the code: from scipy import stats import numpy as np import pandas as pd import collections import matplotlib.pyplot

What is the Search/Prediction Time Complexity of Logistic Regression?

久未见 提交于 2019-12-22 18:36:33
问题 I am looking into the time complexities of Machine Learning Algorithms and I cannot find what is the time complexity of Logistic Regression for predicting a new input. I have read that for Classification is O(c*d) c-beeing the number of classes, d-beeing the number of dimensions and I know that for the Linear Regression the search/prediction time complexity is O(d). Could you maybe explain what is the search/predict time complexity of Logistic Regression? Thank you in advance Example For The

Error with training logistic regression model on Apache Spark. SPARK-5063

て烟熏妆下的殇ゞ 提交于 2019-12-22 18:30:43
问题 I am trying to build a Logistic Regression model with Apache Spark. Here is the code. parsedData = raw_data.map(mapper) # mapper is a function that generates pair of label and feature vector as LabeledPoint object featureVectors = parsedData.map(lambda point: point.features) # get feature vectors from parsed data scaler = StandardScaler(True, True).fit(featureVectors) #this creates a standardization model to scale the features scaledData = parsedData.map(lambda lp: LabeledPoint(lp.label,