R substr function on multiple columns

后端 未结 1 1277
广开言路
广开言路 2021-01-27 18:24

I have 3 columns. First column has unique ID, second and third columns have string data and some NA data. I need to extract info from column 2 and put it in separate columns an

相关标签:
1条回答
  • 2021-01-27 19:08

    You can use sapply to apply a function to each element of a vector - this could be useful here, since you could use sapply on the columns of your original data frame (test) to create the columns for your new data frame.

    Here's a solution that does this:

    test = data.frame(UID = c('Z001NL', 'Z001NP', 'Z0024G'), 
      V1 = c('AAAbbb', 'IADSFO', 'SFOHNL'),
      V2 = c('IADSFO', NA, 'NLSFO0'))
    
    substring_it = function(x){
      # x is a data frame
      c1 = sapply(x[,2], function(x) substr(x, 1, 3))
      c2 = sapply(x[,2], function(x) substr(x, 4, 6))
      c3 = sapply(x[,3], function(x) substr(x, 1, 3))
      c4 = sapply(x[,3], function(x) substr(x, 4, 6))
      return(data.frame(UID=x[,1], c1, c2, c3, c4))
    }
    
    substring_it(test)
    # returns:
    #     UID  c1  c2   c3   c4
    #1 Z001NL AAA bbb  IAD  SFO
    #2 Z001NP IAD SFO <NA> <NA>
    #3 Z0024G SFO HNL  NLS  FO0
    

    EDIT: here's a way to loop over columns if you have to do this a bunch of times. I'm not sure what order your original data frame's columns are in and what order you want the new data frame's columns to end up in, so you may need to play around with the "pos" counter. I also assumed the columns to be split were columns 2 thru 201 ("colindex"), so you'll probably have to change that.

    newcolumns = list()
    pos = 1 #counter for column index of new data frame
    for(colindex in 2:201){
        newcolumns[[pos]] = sapply(test[,colindex], function(x) substr(x, 1, 3))
        newcolumns[[pos+1]] = sapply(test[,colindex], function(x) substr(x, 4, 6))
        pos = pos+2
    }
    newdataframe = data.frame(UID = test[,1], newcolumns)
    # update "names(newdataframe)" as needed
    
    0 讨论(0)
提交回复
热议问题