I am trying to rename columns of multiple data.frame
s.
To give an example, let\'s say I\'ve a list of data.frame
s dfA
, d
There are two things here:
1) You should return the value you want from your function. Else, the last value will be returned. In your case, that's names(x)
. So, instead you should add as the final line, return(x)
or simply x
. So, your function would look like:
ChangeNames <- function(x) {
names(x) <- c("A", "B", "C" )
return(x)
}
2) lapply
does not modify your input objects by reference. It works on a copy. So, you'll have to assign the results back. Or another alternative is to use for-loops
instead of lapply
:
# option 1
dfs <- lapply(dfs, ChangeNames)
# option 2
for (i in seq_along(dfs)) {
names(dfs[[i]]) <- c("A", "B", "C")
}
Even using the for-loop
, you'll still make a copy (because names(.) <- .
does). You can verify this by using tracemem
.
df <- data.frame(x=1:5, y=6:10, z=11:15)
tracemem(df)
# [1] "<0x7f98ec24a480>"
names(df) <- c("A", "B", "C")
tracemem(df)
# [1] "<0x7f98e7f9e318>"
If you want to modify by reference, you can use data.table
package's setnames
function:
df <- data.frame(x=1:5, y=6:10, z=11:15)
require(data.table)
tracemem(df)
# [1] "<0x7f98ec76d7b0>"
setnames(df, c("A", "B", "C"))
tracemem(df)
# [1] "<0x7f98ec76d7b0>"
You see that the memory location df
is mapped to hasn't changed. The names have been modified by reference.
If the dataframes were not in a list but just in the global environment, you could refer to them using a vector of string names.
dfs <- c("dfA", "dfB", "dfC")
for(df in dfs) {
df.tmp <- get(df)
names(df.tmp) <- c("A", "B", "C" )
assign(df, df.tmp)
}
EDIT
To simplify the above code you could use
for(df in dfs)
assign(df, setNames(get(df), c("A", "B", "C")))
or using data.table
which doesn't require reassigning.
for(df in c("dfA", "dfB"))
data.table::setnames(get(df), c("G", "H"))
I had the problem of importing a public data set and having to rename each dataframe and rename each column in each dataframe to trim whitespaces, lowercase, and replace internal spaces with periods.
Combining the above methods got me:
for (eachdf in dfs)
df.tmp <- get(eachdf)
for (eachcol in 1:length(df.tmp))
colnames(df.tmp)[eachcol] <-
str_trim(str_to_lower(str_replace_all(colnames(df.tmp)[eachcol], " ", ".")))
}
assign(eachdf, df.tmp)
}