问题
I'm relatively new to R. I am wondering how to use the 'survey' package (http://r-survey.r-forge.r-project.org/survey/) to analyze a multiple response question for a weighted sample? The tricky bit is that more than one response can be ticked so the responses are stored across several columns.
Example:
I have survey data from 500 respondents who were drawn randomly from across 10 districts. Let's say the main question that was asked was (stored in column H1_AreYouHappy): 'Are you happy?' - Yes / No / Don't know
The respondent is asked a follow-up question: 'WHY are you (un)happy?' This is a multiple choice question and more than one response box can be ticked, so responses are stored in separate columns, for example:
H1Yes_Why1 (0/1, i.e. box ticked or not ticked) - 'Because of the economny';
H1Yes_Why2 (0/1) - 'Because I'm healthy';
H1Yes_Why3 (0/1) - 'Because of my social life'.
Here is my fake data set
districts <- c('Green', 'Red','Orange','Blue','Purple','Grey','Black','Yellow','White','Lavender')
myDataFrame <- data.frame(H1_AreYouHappy=sample(c('Yes','No','Dont Know'),500,rep=TRUE),
H1Yes_Why1 = sample(0:1,500,rep=TRUE),
H1Yes_Why2 = sample(0:1,500,rep=TRUE),
H1Yes_Why3 = sample(0:1,500,rep=TRUE),
District = sample(districts,500,rep=TRUE), stringsAsFactors=TRUE)
I am using the R 'survey' package to apply post-stratification weights according to the de-facto population size of each district
library(survey)
# Create an unweighted survey object
mySurvey.unweighted <- svydesign(ids=~1, data=myDataFrame)
# Choose which variable contains the sample distribution to be weighted by
sample.distribution <- list(~District)
# Specify (from Census data) how often each level occurs in the population
population.distribution <- data.frame(District = c('Green', 'Red','Orange','Blue','Purple','Grey','Black','Yellow','White','Lavender'),
freq = c(0.1824885, 0.0891206, 0.1381343, 0.1006533, 0.1541269, 0.0955853, 0.0268172, 0.0398353, 0.0809459, 0.0922927))
# Apply the weights
mySurvey.rake <- rake(design = mySurvey.unweighted, sample.margins=sample.distribution, population.margins=list(population.distribution))
# Calculate the weighted mean for the main question
svymean(~H1_AreYouHappy, mySurvey.rake)
# How can I calculate the WEIGHTED means for the multiple choice - multiple response follow-up question?
How can I calculate the WEIGHTED means for the multiple choice question (i.e. across the 0/1 response columns)?
If I wanted it unweighted, I could just use this function which calculates the frequencies across all columns that match my prefix 'H1Yes_Why'
multipleResponseFrequencies = function(data, question.prefix) {
# Find the columns with the questions
a = grep(question.prefix, names(data))
# Find the total number of responses
b = sum(data[, a] != 0)
# Find the totals for each question
d = colSums(data[, a] != 0)
# Find the number of respondents
e = sum(rowSums(data[,a]) !=0)
# d + b as a vector. This is the overfall frequency
f = as.numeric(c(d, b))
result <- data.frame(question = c(names(d), "Total"),
freq = f,
percent = (f/b)*100,
percentofcases = (f/e)*100)
result
}
multipleResponseFrequencies(myDataFrame, 'H1Yes_Why')
Any help would be greatly appreciated.
回答1:
i think you want
svyratio( ~ H1Yes_Why1 + H1Yes_Why2 + H1Yes_Why3 , ~ as.numeric( H1Yes_Why1 + H1Yes_Why2 + H1Yes_Why3 ) , mySurvey.rake)
来源:https://stackoverflow.com/questions/38675151/how-to-use-the-r-survey-package-to-analyze-multiple-response-questions-in-a-weig