问题
I have a series of data frames structured similarly to this:
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21))
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60))
In order to clean them I wrote a user defined function with a set of cleaning steps:
clean <- function(df){
colnames(df) <- df[2,]
df <- df[grep('^[0-9]{4}', df$year),]
return(df)
}
I'd now like to put my data frames in a list:
df_list <- list(df,df2)
and clean them all at once. I tried
lapply(df_list, clean)
and
for(df in df_list){
clean(df)
}
But with both methods I get the error:
Error in df[2, ] : incorrect number of dimensions
What's causing this error and how can I fix it? Is my approach to this problem wrong?
回答1:
You are close, but there is one problem in code. Since you have text in your dataframe's columns, the columns are created as factors and not characters. Thus your column naming does not provide the expected result.
#need to specify strings to factors as false
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21), stringsAsFactors = FALSE)
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60), stringsAsFactors = FALSE)
clean <- function(df){
colnames(df) <- df[2,]
#need to specify the column to select the rows
df <- df[grep('^[0-9]{4}', df$year),]
#convert the columns to numeric values
df[, 1:ncol(df)] <- apply(df[, 1:ncol(df)], 2, as.numeric)
return(df)
}
df_list <- list(df,df2)
lapply(df_list, clean)
来源:https://stackoverflow.com/questions/50630740/apply-a-user-defined-function-to-a-list-of-data-frames