Split a column of concatenated comma-delimited data and recode output as factors

前端 未结 2 790
-上瘾入骨i
-上瘾入骨i 2020-11-27 08:09

I am trying to clean up some data that has been incorrectly entered. The question for the variable allows for multiple responses out of five choices, numbered as 1 to 5. The

相关标签:
2条回答
  • 2020-11-27 08:24

    You just need to write a function and use apply. First some dummy data:

    ##Make sure you're not using factors
    dd = data.frame(V1 = c("1, 2, 3", "1, 2, 4", "2, 3, 4, 5", 
                             "1, 3, 4", "1, 3, 5", "2, 3, 4, 5"), 
                         stringsAsFactors=FALSE)
    

    Next, create a function that takes in a row and transforms as necessary

    make_row = function(i, ncol=5) {
      ##Could make the default NA if needed
      m = numeric(ncol)
      v = as.numeric(strsplit(i, ",")[[1]])
      m[v] = 1
      return(m)
    }
    

    Then use apply and transpose the result

    t(apply(dd, 1, make_row))
    
    0 讨论(0)
  • 2020-11-27 08:36

    A long time later, I finally got around to creating a package ("splitstackshape") that deals with this kind of data in an efficient manner. So, for the convenience of others (and some self-promotion, of course) here's a compact solution.

    The relevant function for this problem is cSplit_e.

    First, the default settings, which retains the original column and uses NA as the fill:

    library(splitstackshape)
    cSplit_e(data, "V1")
    #           V1 V1_1 V1_2 V1_3 V1_4 V1_5
    # 1    1, 2, 3    1    1    1   NA   NA
    # 2    1, 2, 4    1    1   NA    1   NA
    # 3 2, 3, 4, 5   NA    1    1    1    1
    # 4    1, 3, 4    1   NA    1    1   NA
    # 5    1, 3, 5    1   NA    1   NA    1
    # 6 2, 3, 4, 5   NA    1    1    1    1
    

    Second, with dropping the original column and using 0 as the fill.

    cSplit_e(data, "V1", drop = TRUE, fill = 0)
    #   V1_1 V1_2 V1_3 V1_4 V1_5
    # 1    1    1    1    0    0
    # 2    1    1    0    1    0
    # 3    0    1    1    1    1
    # 4    1    0    1    1    0
    # 5    1    0    1    0    1
    # 6    0    1    1    1    1
    
    0 讨论(0)
提交回复
热议问题