问题
I am collating multiple excel files into one using data frames. There are duplicate columns in the files. Is it possible to merge only the unique columns?
Here is my code:
library(rJava)
library (XLConnect)
data.files = list.files(pattern = "*.xls")
# Read the first file
df = readWorksheetFromFile(file=data.files[1], sheet=1, check.names=F)
# Loop through the remaining files and merge them to the existing data frame
for (file in data.files[-1]) {
newFile = readWorksheetFromFile(file=file, sheet=1, check.names=F)
df = merge(df, newFile, all = TRUE, check.names=F)
}
回答1:
First of all, if you apply merge
correctly, there shouldn't be any duplicated columns, provided that the duplicated columns also have the exact same name in the EXCEL files. As you use merge
, there must be at least one column in the EXCEL files that have the exact same name, and contains the values used to merge them.
So I reckon you want to check in the resulting data frame whether there are duplicate columns based on the values in each column. For this, you could use the following:
keepUnique <- function(x){
combs <- combn(names(x),2)
dups <- mapply(identical,
x[combs[1,]],
x[combs[2,]])
drop <- combs[2,][dups]
x[ !names(x) %in% drop ]
}
Which gives :
> mydf <- cbind(iris,iris[,3])[1:5,]
> mydf
Sepal.Length Sepal.Width Petal.Length Petal.Width Species iris[, 3]
1 5.1 3.5 1.4 0.2 setosa 1.4
2 4.9 3.0 1.4 0.2 setosa 1.4
3 4.7 3.2 1.3 0.2 setosa 1.3
4 4.6 3.1 1.5 0.2 setosa 1.5
5 5.0 3.6 1.4 0.2 setosa 1.4
> keepUnique(mydf)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
You can use this after reading in a file, i.e. add the line
newFile <- keepUnique(newFile,df)
in your own code.
来源:https://stackoverflow.com/questions/22568946/delete-duplicate-columns