问题
I have a dataframe with 95 cols and want to batch-rename a lot of them with simple regexes, like the snippet at bottom, there are ~30 such lines. Any other columns which don't match the search regex must be left untouched.
**** Example: names(tr) = c('foo', 'bar', 'xxx_14', 'xxx_2001', 'yyy_76', 'baz', 'zzz_22', ...) ****
I started out with a wall of 25 gsub()s - crude but effective:
names(tr) <- gsub('_1$', '_R', names(tr))
names(tr) <- gsub('_14$', '_I', names(tr))
names(tr) <- gsub('_22$', '_P', names(tr))
names(tr) <- gsub('_50$', '_O', names(tr))
... yada yada
@Joshua: mapply doesn't work, turns out it's more complicated and impossible to vectorize. names(tr) contains other columns, and when these patterns do occur, you cannot assume all of them occur, let alone in the exact order we defined them. Hence, try 2 is:
pattern <- paste('_', c('1','14','22','50','52','57','76','1018','2001','3301','6005'), '$', sep='')
replace <- paste('_', c('R','I', 'P', 'O', 'C', 'D', 'M', 'L', 'S', 'K', 'G'), sep='')
do.call(gsub, list(pattern, replace, names(tr)))
Warning messages:
1: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE, :
argument 'pattern' has length > 1 and only the first element will be used
2: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE, :
argument 'replacement' has length > 1 and only the first element will be used
Can anyone fix this for me?
EDIT: I read all around SO and R doc on this subject for over a day and couldn't find anything... then when I post it I think of searching for '[r] translation table' and I find xlate. Which is not mentioned anywhere in the grep/sub/gsub documentation.
Is there anything in
base/gsubfn/data.table
etc. to allow me to write one search-and-replacement instruction? (like a dictionary or translation table)Can you improve my clunky syntax to be call-by-reference to tr? (mustn't create temp copy of entire df)
EDIT2: my best effort after reading around was:
The dictionary approach (xlate) might be a partial answer to, but this is more than a simple translation table since the regex must be terminal (e.g. '_14$').
I could use gsub() or strsplit() to split on '_' then do my xlate translation on the last component, then paste() them back together. Looking for a cleaner 1/2-line idiom.
Or else I just use walls of gsub()s.
回答1:
Wall of gsub
could be always replace by for-loop
. And you can write it as a function:
renamer <- function(x, pattern, replace) {
for (i in seq_along(pattern))
x <- gsub(pattern[i], replace[i], x)
x
}
names(tr) <- renamer(
names(tr),
sprintf('_%s$', c('1','14','22','50','52','57','76','1018','2001','3301','6005')),
sprintf('_%s' , c('R','I', 'P', 'O', 'C', 'D', 'M', 'L', 'S', 'K', 'G'))
)
And I found sprintf
more useful than paste
for creation this kind of strings.
回答2:
The question predates the boom of the tidyverse
but this is easily solved with the c(pattern1 = replacement1)
option in stringr::str_replace_all
.
tr <- data.frame("whatevs_1" = NA, "something_52" = NA)
tr
#> whatevs_1 something_52
#> 1 NA NA
patterns <- sprintf('_%s$', c('1','14','22','50','52','57','76','1018','2001','3301','6005'))
replacements <- sprintf('_%s' , c('R','I', 'P', 'O', 'C', 'D', 'M', 'L', 'S', 'K', 'G'))
names(replacements) <- patterns
names(tr) <- stringr::str_replace_all(names(tr), replacements)
tr
#> whatevs_R something_C
#> 1 NA NA
And of course, this particular case can benefit from dplyr
dplyr::rename_all(tr, stringr::str_replace_all, replacements)
#> whatevs_R something_C
#> 1 NA NA
回答3:
Using do.call()
nearly does it, it objects to differing arg lengths. I think I need to nest do.call()
inside apply()
, like in apply function to elements over a list.
But I need a partial do.call()
over pattern and replace.
This is all starting to make a wall of gsub(..., fixed=TRUE) look like a more efficient idiom, if flabby code.
pattern <- paste('_', c('1','14','22','50'), '$', sep='')
replace <- paste('_', c('R','I', 'P', 'O'), sep='')
do.call(gsub, list(pattern, replace, names(tr)))
Warning messages:
1: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE, :
argument 'pattern' has length > 1 and only the first element will be used
2: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE, :
argument 'replacement' has length > 1 and only the first element will be used
来源:https://stackoverflow.com/questions/10455318/perform-multiple-search-and-replaces-on-the-colnames-of-a-dataframe