I have 3 columns. First column has unique ID, second and third columns have string data and some NA data. I need to extract info from column 2 and put it in separate columns an
You can use sapply
to apply a function to each element of a vector - this could be useful here, since you could use sapply on the columns of your original data frame (test) to create the columns for your new data frame.
Here's a solution that does this:
test = data.frame(UID = c('Z001NL', 'Z001NP', 'Z0024G'),
V1 = c('AAAbbb', 'IADSFO', 'SFOHNL'),
V2 = c('IADSFO', NA, 'NLSFO0'))
substring_it = function(x){
# x is a data frame
c1 = sapply(x[,2], function(x) substr(x, 1, 3))
c2 = sapply(x[,2], function(x) substr(x, 4, 6))
c3 = sapply(x[,3], function(x) substr(x, 1, 3))
c4 = sapply(x[,3], function(x) substr(x, 4, 6))
return(data.frame(UID=x[,1], c1, c2, c3, c4))
}
substring_it(test)
# returns:
# UID c1 c2 c3 c4
#1 Z001NL AAA bbb IAD SFO
#2 Z001NP IAD SFO <NA> <NA>
#3 Z0024G SFO HNL NLS FO0
EDIT: here's a way to loop over columns if you have to do this a bunch of times. I'm not sure what order your original data frame's columns are in and what order you want the new data frame's columns to end up in, so you may need to play around with the "pos" counter. I also assumed the columns to be split were columns 2 thru 201 ("colindex"), so you'll probably have to change that.
newcolumns = list()
pos = 1 #counter for column index of new data frame
for(colindex in 2:201){
newcolumns[[pos]] = sapply(test[,colindex], function(x) substr(x, 1, 3))
newcolumns[[pos+1]] = sapply(test[,colindex], function(x) substr(x, 4, 6))
pos = pos+2
}
newdataframe = data.frame(UID = test[,1], newcolumns)
# update "names(newdataframe)" as needed