How to clean and re-code check-all-that-apply responses in R survey data?

前端 未结 1 1451
太阳男子
太阳男子 2021-01-14 15:46

I\'ve got survey data with some multiple-response questions like this:

HS18 Why is it difficult to get medical care in South Africa? (Select all that apply)

相关标签:
1条回答
  • 2021-01-14 16:21

    My best thought for analyzing multi-select questions like this is to convert the possible answers into indicator variables: take all of your possible answers (1 to 8 in this example) and create data columns named HS18.1, HS18.2, etc. (You can optionally include something more in the column name, but that's completely between you and the PI.)

    Your sample data here looks like it includes data that is not legal: 0, 888, and 999 are not listed in the options. It's possible/likely that these include DK/NR responses, but I can't be certain. As such:

    1. Your data cleaning should be taking care of these anomalies before this step of converting 0+ length lists into indicator variables.

    2. My code below arbitrarily ignores this fact and you will lose data. This is obviously not "A Good Thing™" in the long run. More robust checks are warranted (and not difficult). (I've added an other column to indicate something was lost.)

    The code:

    ss <- '888 1 6 4 5 8 2 3,5 4,6 3,6 3,4 3 4,5,6 7 999 4,5 2,6 4,8 7,8 1,6 1,2,3 5,7,8 4,5,6,7 1,4 0 5,6,7 5,6 2,3 1,4,6,7 1,4,5'
    dat <- lapply(strsplit(ss, ' '), strsplit, ',')[[1]]
    lvls <- as.character(1:8)
    ## lvls <- sort(unique(unlist(dat))) # alternative method
    ret <- structure(lapply(lvls, function(lvl) sapply(dat, function(xx) lvl %in% xx)),
                     .Names = paste0('HS18.', lvls),
                     row.names = c(NA, -length(dat)), class = 'data.frame')
    ret$HS18.other <- sapply(dat, function(xx) !all(xx %in% lvls))
    ret <- 1 * ret ## convert from TRUE/FALSE to 1/0
    head(1 * ret)
    ##   HS18.1 HS18.2 HS18.3 HS18.4 HS18.5 HS18.6 HS18.7 HS18.8 HS18.other
    ## 1      0      0      0      0      0      0      0      0          1
    ## 2      1      0      0      0      0      0      0      0          0
    ## 3      0      0      0      0      0      1      0      0          0
    ## 4      0      0      0      1      0      0      0      0          0
    ## 5      0      0      0      0      1      0      0      0          0
    ## 6      0      0      0      0      0      0      0      1          0
    

    The resulting data.frame can be cbinded (or even matrixized) to whatever other data you have.

    (I use 1 and 0 instead of TRUE and FALSE because you said the PI will not be using R; this can easily be changed to a character string or something that makes more sense to them.)

    0 讨论(0)
提交回复
热议问题