Rbinding large list of dataframes after I did some data cleaning on the list

﹥>﹥吖頭↗ 提交于 2019-12-12 01:12:30

问题


My problem is, that I can't merge a large list of dataframes before doing some data cleaning. But it seems like my data cleaning is missing from the list.

I have 43 xlsx-files, which I've put in a list.

Here's my code for that part:

file.list <- list.files(recursive=T,pattern='*.xlsx')

dat = lapply(file.list, function(i){
    x = read.xlsx(i, sheet=1, startRow=2, colNames = T,
            skipEmptyCols = T, skipEmptyRows = T)

# Create column with file name  
x$file = i

# Return data
x
})

I then did some datacleaning. Some of the dataframes had some empty columns that weren't skipped in the loading and some columns I just didn't need.

Example of how I removed one column (X1) from all dataframes in the list:

dat <- lapply(dat, function(x) { x["X1"] <- NULL; x })

I also applies column names:

colnames <- c("ID", "UDLIGNNR","BILAGNR", "AKT", "BA",
          "IART", "HTRANS", "DTRANS", "BELOB", "REGD",
          "BOGFD", "AFVBOGFD", "VALORD", "UDLIGND", 
          "UÅ", "AFSTEMNGL", "NRBASIS", "SPECIFIK1",
          "SPECIFIK2", "SPECIFIK3", "PERIODE","FILE")
dat <- lapply(dat, setNames, colnames)

My problem is, when I open the list or look at the elements in the list, my data cleaning is missing.

And I can't bind the dataframes before the data cleaning since they're aren't looking the same.

What am I doing wrong here?

EDIT: Sample data*

# Sample data
a <- c("a","b","c")
b <- c(1,2,3)
X1 <- c("", "","")
c <- c("a","b","c")
X2 <- c(1,2,3)
X1 <- c("", "","")
df1 <- data.frame(a,b,c,X1)
df2 <- data.frame(a,b,c,X1,X2)

# Putting in list
dat <- list(df1,df2)

# Removing unwanted columns
dat <- lapply(dat, function(x) { x["X1"] <- NULL; x })
dat <- lapply(dat, function(x) { x["X2"] <- NULL; x })

# Setting column names
colnames <- c("Alpha", "Beta", "Gamma")
dat <- lapply(dat, setNames, colnames)

# Merging dataframes 
df <- do.call(rbind,dat)

So I've just found that with my sample data this goes smoothly. I had to reopen the list in View-mode to see the changes I made. That doesn't change the fact that when writing to csv and reopening all the data cleaning is missing (haven'tr tried this with my sample data).

I am wondering if it's because I've changed the merge?

# My merge when I wrote this question: 
df <- do.call("rbindlist", dat)

# My merge now: 
df <- do.call(rbind,dat)

When I use my real data it doesnøt go as smoothly, so I guess the sample data is bad. I don't know what I'm doing wrong so I can't give some better sample data.

The message I get when merging with rbind:

error in rbind(deparse.level ...) numbers of columns of arguments do not match

来源:https://stackoverflow.com/questions/54569448/rbinding-large-list-of-dataframes-after-i-did-some-data-cleaning-on-the-list

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!