When using stargazer to create a LaTeX table on a logistic regression object the standard behaviour is to output logit-values of each model. Is it possible to get exp(logit)
There are pieces of the right answer across the various posts, but none of them seem to put it all together. Assuming the following:
glm_out <- glm(Y ~ X, data=DT, family = "binomial")
For a logistic regression, the regression coefficient (b1) is the estimated increase in the log odds of Y per unit increase in X. So, to get the odds-ratio, we just use the exp function:
OR <- exp(coef(glm_out))
# pass in coef directly
stargazer(glm_out, coef = list(OR), t.auto=F, p.auto=F)
# or, use the apply.coef option
stargazer(glm_out, apply.coef = exp, t.auto=F, p.auto=F)
You cannot simply use apply.se = exp
to get the Std. Error for the Odds Ratio
Instead, you have to use the function: Std.Error.OR = OR * SE(coef)
# define a helper function to extract SE from glm output
se.coef <- function(glm.output){sqrt(diag(vcov(glm.output)))}
# or, you can use the arm package
se.coef <- arm::se.coef
#Get the odds ratio
OR <- exp(coef(glm_out))
# Then, we can get the `StdErr.OR` by multiplying the two:
Std.Error.OR <- OR * se.coef(glm_out)
So, to get it into stargazer, we use the following:
# using Std Errors
stargazer(glm_out, coef=list(OR), se = list(Std.Error.OR), t.auto=F, p.auto=F)
Confidence intervals in an odds-ratio setting are not symmetric. So, we cannot just do ±1.96*SE(OR) to get the CI. Instead, we can compute it from the original log odds exp(coef ± 1.96*SE)
.
# Based on normal distribution to compute Wald CIs:
# we use confint.default to obtain the conventional confidence intervals
# then, use the exp function to get the confidence intervals
CI.OR <- as.matrix(exp(confint.default(glm_out)))
So, to get it into stargazer, we use the following:
# using ci.custom
stargazer(glm_out, coef=list(OR), ci.custom = list(CI.OR), t.auto=F, p.auto=F, ci = T)
# using apply.ci
stargazer(glm_out, apply.coef = exp, apply.ci = exp, t.auto=F, p.auto=F, ci = T)
Do not use the Confidence Intervals of Odds Ratios to compute significance (see note and reference at the bottom). Instead, you can do it using the log odds:
z <- coef(glm_out)/se.coef(glm_out)
And, use that to get the p.values for significance tests:
pvalue <- 2*pnorm(abs(coef(glm_out)/se.coef(glm_out)), lower.tail = F)
(source: https://data.princeton.edu/wws509/r/c3s1)
See this link for more detailed discussion on statistical testing: https://stats.stackexchange.com/questions/144603/why-do-my-p-values-differ-between-logistic-regression-output-chi-squared-test
It is important to note however, that unlike the p value, the 95% CI does not report a measure’s statistical significance. In practice, the 95% CI is often used as a proxy for the presence of statistical significance if it does not overlap the null value (e.g. OR=1). Nevertheless, it would be inappropriate to interpret an OR with 95% CI that spans the null value as indicating evidence for lack of association between the exposure and outcome. source: Explaining Odds Ratios
So, the issue is that you want to display the (non-log) odds ratio, but keep the test statistics based on the underlying linear model. By default, when you use one of the "apply" methods, such as apply.coef = exp
, stargazer will recalculate the t statistics and p values. We don't want that. Also, the standard errors are in the log basis, but we can't just exponentiate them. My preferred approach is to:
In code, this is:
stargazer(model, apply.coef=exp, t.auto=F, p.auto=F, report = "vct*")
As per symbiotic comment in 2014, more recent versions of ''stargazer'' have the options ''apply.*'' for ''coef'' ''se'' ''t'' ''p'' and ''ci'' allowing the direct transformation of these statistics.
apply.coef a function that will be applied to the coefficients.
apply.se a function that will be applied to the standard errors.
apply.t a function that will be applied to the test statistics.
apply.p a function that will be applied to the p-values.
apply.ci a function that will be applied to the lower and upper bounds of the confidence intervals.
Meaning you can directly use...
stargazer(model,
apply.coef = exp,
apply.se = exp)
EDIT : I have noticed however that simply exponentiating the CIs does not give what you would expect.
EDIT : You can obtain the correct CIs using the method described here.
stargazer
allows you to substitute a lot of things, dependent variable labels, covariate labels and so forth. To substitute those you need to supply a vector of variable labels, this is done to have publishable row names, instead of variable names from R
by default.
So in order to have odds ratios, you need to supply a vector of odds ratios to stargazer
. How do you obtain that vector? Very easily, actually. Let's say that your model is called model
, then your code is:
coef.vector <- exp(model$coef)
stargazer(model,coef=list(coef.vector))
If you have multiple models in your table, then the list should be expanded, e.g. coef=list(coef.vector1,coef.vector2,...)
, where all vectors in the list would be derived from similar exponentiation as above.