How to use R for multiple select questions?

后端 未结 3 583
攒了一身酷
攒了一身酷 2021-02-04 19:03

I am trying to figure out how to analyze multiple select/multiple responses (i.e., \'select all that apply\') questions in a survey I recently conducted.

SPSS has nice c

相关标签:
3条回答
  • 2021-02-04 19:13

    I've not found anything that is quite as convenient as the multiple response sets in SPSS. However, you can create groups relatively easily based on common column names, and then use any of the apply() function or friends to iterate through each group. Here's one approach using adply() from the plyr package:

    library(plyr)
    set.seed(1)
    #Fake data with three "like" questions. 0 = non selected, 1 = selected
    dat <- data.frame(resp = 1:10,
                      like1 = sample(0:1, 10, TRUE),
                      like2 = sample(0:1, 10, TRUE),
                      like3 = sample(0:1, 10, TRUE)
                      )
    
    adply(dat[grepl("like", colnames(dat))], 2, function(x)
      data.frame(Count = as.data.frame(table(x))[2,2], 
            Perc = as.data.frame(prop.table(table(x)))[2,2]))
    #-----
         X1 Count Perc
    1 like1     6  0.6
    2 like2     5  0.5
    3 like3     3  0.3
    
    0 讨论(0)
  • 2021-02-04 19:24

    I recently wrote a quick function to deal with these. You can easily modify it to add proportion of total responses too.

    set.seed(1)
    dat <- data.frame(resp = 1:10,
                      like1 = sample(0:1, 10, TRUE),
                      like2 = sample(0:1, 10, TRUE),
                      like3 = sample(0:1, 10, TRUE))
    

    The function:

    multi.freq.table = function(data, sep="", dropzero=FALSE, clean=TRUE) {
      # Takes boolean multiple-response data and tabulates it according
      #   to the possible combinations of each variable.
      #
      # See: http://stackoverflow.com/q/11348391/1270695
    
      counts = data.frame(table(data))
      N = ncol(counts)
      counts$Combn = apply(counts[-N] == 1, 1, 
                           function(x) paste(names(counts[-N])[x],
                                             collapse=sep))
      if (isTRUE(dropzero)) {
        counts = counts[counts$Freq != 0, ]
      } else if (!isTRUE(dropzero)) {
        counts = counts
      }
      if (isTRUE(clean)) {
        counts = data.frame(Combn = counts$Combn, Freq = counts$Freq)
      } 
      counts
    }
    

    Apply the function:

    multi.freq.table(dat[-1], sep="-")
    #               Combn Freq
    # 1                      1
    # 2             like1    2
    # 3             like2    2
    # 4       like1-like2    2
    # 5             like3    1
    # 6       like1-like3    1
    # 7       like2-like3    0
    # 8 like1-like2-like3    1
    

    Hope this helps! Otherwise, show some examples of desired output or describe some features, and I'll see what can be added.

    Update

    After looking at the output of SPSS for this online, it seems like the following should do it for you. This is easy enough to wrap into a function if you need to use it a lot.

    data.frame(Freq = colSums(dat[-1]),
               Pct.of.Resp = (colSums(dat[-1])/sum(dat[-1]))*100,
               Pct.of.Cases = (colSums(dat[-1])/nrow(dat[-1]))*100)
    #       Freq Pct.of.Resp Pct.of.Cases
    # like1    6    42.85714           60
    # like2    5    35.71429           50
    # like3    3    21.42857           30
    
    0 讨论(0)
  • 2021-02-04 19:25
    multfreqtable(data_set, "Banner")
    multfreqtable = function(data, question.prefix) {
      z = length(question.prefix)
      temp = vector("list", z)
    
      for (i in 1:z) {
        a = grep(question.prefix[i], names(data))
        b = sum(data[, a] != 0)
        d = colSums(data[, a] != 0)
        e = sum(rowSums(data[,a]) !=0)
        f = as.numeric(c(d, b))
        temp[[i]] = data.frame(question = c(sub(question.prefix[i], 
                                                "", names(d)), "Total"),
                               freq = f,
                               percent_response = (f/b)*100,
                               percent_cases = (f/e)*100 )
        names(temp)[i] = question.prefix[i]
      }
      temp
    }
    

    does a very good job of giving you numbers, percentages at the number of cases level and percentage at the number of responses level. Perfect for analyzing Multi-Response Questions

    0 讨论(0)
提交回复
热议问题