Extract names of dataframe in list of dataframes and add it to columnnames

问题

I have a list of dataframes:

set.seed(23) 
date_list = seq(1:30)
testframe = data.frame(Date = date_list)
testframe$ABC = rnorm(30)
testframe$DEF = rnorm(30)
testframe$GHI = seq(from = 10, to = 25, length.out = 30)
testframe$JKL = seq(from = 5, to = 45, length.out = 30)

testlist = list(testframe, testframe, testframe)
names(testlist) = c("df1464", "df6355", "df94566")

I want now to extract the name of each dataframe and add it to its columns. So the columnnames of the first dataframe in the list should be: Date_df1464, ABC_df1464, DEF_df1464, GHI_df1464 and JKL_df1464

I created this loop, but its not working:

for (a  in names(testlist)) {
  for(i in 1: length(testlist)){
    allcolnames = colnames(testlist[[i]])
    allcolnames = paste(allcolnames, a , sep = "_")
    testlist[[i]] = colnames(allcolnames)
  }
}

I get this error:

Error in testlist[[i]] : subscript out of bounds

I am pretty clueless why it doesnt work. Any ideas?

回答1:

Your solution was nearly right, you just do not need to loop two times. And your colnames call was the wrong way around. This should work:

for(i in 1: length(testlist)){
    allcolnames = colnames(testlist[[i]])
    allcolnames = paste(allcolnames, names(testlist)[i] , sep = "_")
    colnames(testlist[[i]]) = allcolnames
}

This also works, without any fors ;):

set.seed(23) 
date_list = seq(1:30)
testframe = data.frame(Date = date_list)
testframe$ABC = rnorm(30)
testframe$DEF = rnorm(30)
testframe$GHI = seq(from = 10, to = 25, length.out = 30)
testframe$JKL = seq(from = 5, to = 45, length.out = 30)

testlist = list(testframe, testframe, testframe)
names(testlist) = c("df1464", "df6355", "df94566")

out <- lapply(names(testlist),function(name){
  dummy <- testlist[[name]]
  names(dummy) <- paste0(names(testlist[[name]]) ,'_',name)
  dummy
})
str(out)
#> List of 3
#>  $ :'data.frame':    30 obs. of  5 variables:
#>   ..$ Date_df1464: int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ ABC_df1464 : num [1:30] 0.193 -0.435 0.913 1.793 0.997 ...
#>   ..$ DEF_df1464 : num [1:30] -0.5532 0.0982 -1.1467 -1.2499 -0.2021 ...
#>   ..$ GHI_df1464 : num [1:30] 10 10.5 11 11.6 12.1 ...
#>   ..$ JKL_df1464 : num [1:30] 5 6.38 7.76 9.14 10.52 ...
#>  $ :'data.frame':    30 obs. of  5 variables:
#>   ..$ Date_df6355: int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ ABC_df6355 : num [1:30] 0.193 -0.435 0.913 1.793 0.997 ...
#>   ..$ DEF_df6355 : num [1:30] -0.5532 0.0982 -1.1467 -1.2499 -0.2021 ...
#>   ..$ GHI_df6355 : num [1:30] 10 10.5 11 11.6 12.1 ...
#>   ..$ JKL_df6355 : num [1:30] 5 6.38 7.76 9.14 10.52 ...
#>  $ :'data.frame':    30 obs. of  5 variables:
#>   ..$ Date_df94566: int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ ABC_df94566 : num [1:30] 0.193 -0.435 0.913 1.793 0.997 ...
#>   ..$ DEF_df94566 : num [1:30] -0.5532 0.0982 -1.1467 -1.2499 -0.2021 ...
#>   ..$ GHI_df94566 : num [1:30] 10 10.5 11 11.6 12.1 ...
#>   ..$ JKL_df94566 : num [1:30] 5 6.38 7.76 9.14 10.52 ...

回答2:

You could switch two Map in series; the inner Map prepares the new names, the outer Map applies it onto the sublists' names.

testlist <- Map(`names<-`, testlist,
                Map(paste, lapply(testlist, names), names(testlist), sep="_"))

Result

lapply(testlist, names)
# $df1464
# [1] "Date_df1464" "ABC_df1464"  "DEF_df1464"  "GHI_df1464"  "JKL_df1464" 
# 
# $df6355
# [1] "Date_df6355" "ABC_df6355"  "DEF_df6355"  "GHI_df6355"  "JKL_df6355" 
# 
# $df94566
# [1] "Date_df94566" "ABC_df94566"  "DEF_df94566"  "GHI_df94566"  "JKL_df94566"

回答3:

Two ways to accomplish this. The better, more encapsulated way would be to use Map, looping over the individual data frames and their corresponding names:

new.testlist <- Map(function(df, name) {
  names(df) <- paste(names(df), name, sep = '_')
  return(df)
}, testlist, names(testlist))

> str(new.testlist)
List of 3
 $ df1464 :'data.frame':    30 obs. of  5 variables:
  ..$ Date_df1464: int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ ABC_df1464 : num [1:30] 0.193 -0.435 0.913 1.793 0.997 ...
  ..$ DEF_df1464 : num [1:30] -0.5532 0.0982 -1.1467 -1.2499 -0.2021 ...
  ..$ GHI_df1464 : num [1:30] 10 10.5 11 11.6 12.1 ...
  ..$ JKL_df1464 : num [1:30] 5 6.38 7.76 9.14 10.52 ...
 $ df6355 :'data.frame':    30 obs. of  5 variables:
  ..$ Date_df6355: int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ ABC_df6355 : num [1:30] 0.193 -0.435 0.913 1.793 0.997 ...
  ..$ DEF_df6355 : num [1:30] -0.5532 0.0982 -1.1467 -1.2499 -0.2021 ...
  ..$ GHI_df6355 : num [1:30] 10 10.5 11 11.6 12.1 ...
  ..$ JKL_df6355 : num [1:30] 5 6.38 7.76 9.14 10.52 ...
 $ df94566:'data.frame':    30 obs. of  5 variables:
  ..$ Date_df94566: int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ ABC_df94566 : num [1:30] 0.193 -0.435 0.913 1.793 0.997 ...
  ..$ DEF_df94566 : num [1:30] -0.5532 0.0982 -1.1467 -1.2499 -0.2021 ...
  ..$ GHI_df94566 : num [1:30] 10 10.5 11 11.6 12.1 ...
  ..$ JKL_df94566 : num [1:30] 5 6.38 7.76 9.14 10.52 ...

The riskier way would be to use the super assignment operator to loop over the names, trusting that testlist remains reliable in your global environment. Note that this second method changes the column names in testlist as a side effect, and is generally NOT considered good practice. Max Teflon's answer is somewhat similar, in that it relies on testlist existing in the global environment, without passing it explicitly to the modifying function.

sapply(names(testlist), function(x) {
  names(testlist[[x]]) <<- paste(names(testlist[[x]]), x, sep = '_')
})

来源：https://stackoverflow.com/questions/56630991/extract-names-of-dataframe-in-list-of-dataframes-and-add-it-to-columnnames

标签

list

multiple-columns

rename

paste