Create new column in dataframe based on partial string matching other column

后端 未结 1 1160
误落风尘
误落风尘 2020-11-27 07:15

I have a dataframe with 2 columns GL and GLDESC and want to add a 3rd column called KIND based on some data that is inside of column <

相关标签:
1条回答
  • 2020-11-27 07:41

    Since you have only two conditions, you can use a nested ifelse:

    #random data; it wasn't easy to copy-paste yours  
    DF <- data.frame(GL = sample(10), GLDESC = paste(sample(letters, 10), 
      c("gas", "payroll12", "GaSer", "asdf", "qweaa", "PayROll-12", 
         "asdfg", "GAS--2", "fghfgh", "qweee"), sample(letters, 10), sep = " "))
    
    DF$KIND <- ifelse(grepl("gas", DF$GLDESC, ignore.case = T), "Materials", 
             ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))
    
    DF
    #   GL         GLDESC      KIND
    #1   8        e gas l Materials
    #2   1  c payroll12 y   Payroll
    #3  10      m GaSer v Materials
    #4   6       t asdf n     Other
    #5   2      w qweaa t     Other
    #6   4 r PayROll-12 q   Payroll
    #7   9      n asdfg a     Other
    #8   5     d GAS--2 w Materials
    #9   7     s fghfgh e     Other
    #10  3      g qweee k     Other
    

    EDIT 10/3/2016 (..after receiving more attention than expected)

    A possible solution to deal with more patterns could be to iterate over all patterns and, whenever there is match, progressively reduce the amount of comparisons:

    ff = function(x, patterns, replacements = patterns, fill = NA, ...)
    {
        stopifnot(length(patterns) == length(replacements))
    
        ans = rep_len(as.character(fill), length(x))    
        empty = seq_along(x)
    
        for(i in seq_along(patterns)) {
            greps = grepl(patterns[[i]], x[empty], ...)
            ans[empty[greps]] = replacements[[i]]  
            empty = empty[!greps]
        }
    
        return(ans)
    }
    
    ff(DF$GLDESC, c("gas", "payroll"), c("Materials", "Payroll"), "Other", ignore.case = TRUE)
    # [1] "Materials" "Payroll"   "Materials" "Other"     "Other"     "Payroll"   "Other"     "Materials" "Other"     "Other"
    
    ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"), 
       c("pat1a|pat1b", "pat2", "pat3"), 
       c("1", "2", "3"), fill = "empty")
    #[1] "1"     "1"     "3"     "empty"
    
    ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"), 
       c("pat2", "pat1a|pat1b", "pat3"), 
       c("2", "1", "3"), fill = "empty")
    #[1] "2"     "1"     "3"     "empty"
    
    0 讨论(0)
提交回复
热议问题