问题
So I have some data that is structured similarly to the following:
| Works | DoesNotWork |
-----------------------
Unmarried| 130 | 235 |
Married | 10 | 95 |
I'm trying to use logistic regression to predict Work Status
from the Marriage Status
, however I don't think I understand how to in R. For example, if my data looks like the following:
MarriageStatus | WorkStatus|
-----------------------------
Married | No |
Married | No |
Married | Yes |
Unmarried | No |
Unmarried | Yes |
Unmarried | Yes |
I understand that I could do the following:
log_model <- glm(WorkStatus ~ MarriageStatus, data=MarriageDF, family=binomial(logit))
When the data is summarized, I just don't understand how to do this. Do I need to expand the data into a non-summarized form and encode Married/Unmarried
as 0/1
and do the same for Working/Not Working
and encode it as 0/1
? .
Given only the first summary DF, how would I write the logistic regression glm
function? Something like this?
log_summary_model <- glm(Works ~ DoesNotWork, data=summaryDF, family=binomial(logit))
But that doesn't make sense as I'm splitting the response dependent variable?
I'm not sure if I'm over complicating this, any help would be greatly appreciated , thanks!
回答1:
You need to expand the contingency table into a data frame then a logit model can be calculated using the frequency count as a weight variable:
mod <- glm(works ~ marriage, df, family = binomial, weights = freq)
summary(mod)
Call:
glm(formula = works ~ marriage, family = binomial, data = df,
weights = freq)
Deviance Residuals:
1 2 3 4
16.383 6.858 -14.386 -4.361
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5921 0.1093 -5.416 6.08e-08 ***
marriage -1.6592 0.3500 -4.741 2.12e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 572.51 on 3 degrees of freedom
Residual deviance: 541.40 on 2 degrees of freedom
AIC: 545.4
Number of Fisher Scoring iterations: 5
Data:
df <- read.table(text = "works marriage freq
1 0 130
1 1 10
0 0 235
0 1 95", header = TRUE)
回答2:
This should do it for you.
library(dplyr)
library(tibble)
# Load data
MarriageDF <- tribble(
~'MarriageStatus', ~'WorkStatus',
'Married', 'No',
'Married', 'No',
'Married', 'Yes',
'Unmarried', 'No',
'Unmarried', 'Yes',
'Unmarried', 'Yes') %>%
mutate(., WorkStatus = as.factor(WorkStatus) %>% as.numeric(.) - 1) # convert to 0/1
log_model <- glm(WorkStatus ~ MarriageStatus, data = MarriageDF, family = 'binomial')
summary(log_model)
::Editing because I believe I read a previous version of the questions::
Yes, you need to 'expand' the data, or format it so that it is tidy (one observation per row). I don't believe there is a way to do logistic regression with the data you have in the first table.
来源:https://stackoverflow.com/questions/52574496/how-to-do-logistic-regression-on-summary-data-in-r