Apply a user defined function to a list of data frames

淺唱寂寞╮ 提交于 2020-12-15 06:37:20

问题


I have a series of data frames structured similarly to this:

df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21))  
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60))

In order to clean them I wrote a user defined function with a set of cleaning steps:

clean <- function(df){
  colnames(df) <- df[2,]
  df <- df[grep('^[0-9]{4}', df$year),]
  return(df)
}

I'd now like to put my data frames in a list:

df_list <- list(df,df2)

and clean them all at once. I tried

lapply(df_list, clean)

and

for(df in df_list){
  clean(df)
}

But with both methods I get the error:

Error in df[2, ] : incorrect number of dimensions

What's causing this error and how can I fix it? Is my approach to this problem wrong?


回答1:


You are close, but there is one problem in code. Since you have text in your dataframe's columns, the columns are created as factors and not characters. Thus your column naming does not provide the expected result.

#need to specify strings to factors as false
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21), stringsAsFactors = FALSE)  
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60), stringsAsFactors = FALSE)

clean <- function(df){
  colnames(df) <- df[2,]
  #need to specify the column to select the rows
  df <- df[grep('^[0-9]{4}', df$year),]

  #convert the columns to numeric values
    df[, 1:ncol(df)] <- apply(df[, 1:ncol(df)], 2, as.numeric)

  return(df)
}

df_list <- list(df,df2)
lapply(df_list, clean)


来源:https://stackoverflow.com/questions/50630740/apply-a-user-defined-function-to-a-list-of-data-frames

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!