I\'m trying to write R code to read data from a mess of old spreadsheets. The exact location of the data varies from sheet to sheet: the only constant is that the first co
In those cases it's important to know the possible conditions of your data. I'm gonna assume that you want only remove columns and rows that doesn't confrom your table.
I have this Excel book:
I added 3 blank columns at left becouse when I loaded in R with one column the program omits them. Thats for confirm that R omits empty cols at the left.
First: load data
library(xlsx)
dat <- read.xlsx('book.xlsx', sheetIndex = 1)
head(dat)
MY.COMPANY.PTY.LTD NA.
1 MC Pension Fund
2 GROSS PERFORMANCE DETAILS
3 updated by IG on 20/04/2017
4 Monthly return
5 Mar-14 0.0097
6 Apr-14 6e-04
Second: I added some cols with NA
and ''
values in the case that your data contain some
dat$x2 <- NA
dat$x4 <- NA
head(dat)
MY.COMPANY.PTY.LTD NA. x2 x4
1 MC Pension Fund NA NA
2 GROSS PERFORMANCE DETAILS NA NA
3 updated by IG on 20/04/2017 NA NA
4 Monthly return NA NA
5 Mar-14 0.0097 NA NA
6 Apr-14 6e-04 NA NA
Third: Remove columns when all values are NA
and ''
. I have to deal with that kind of problems in past
colSelect <- apply(dat, 2, function(x) !(length(x) == length(which(x == '' | is.na(x)))))
dat2 <- dat[, colSelect]
head(dat2)
MY.COMPANY.PTY.LTD NA.
1 MC Pension Fund
2 GROSS PERFORMANCE DETAILS
3 updated by IG on 20/04/2017
4 Monthly return
5 Mar-14 0.0097
6 Apr-14 6e-04
Fourth: Keep only rows with complete observations (it's what I supose from your example)
rowSelect <- apply(dat2, 1, function(x) !any(is.na(x)))
dat3 <- dat2[rowSelect, ]
head(dat3)
MY.COMPANY.PTY.LTD NA.
5 Mar-14 0.0097
6 Apr-14 6e-04
7 May-14 0.0189
8 Jun-14 0.008
9 Jul-14 -0.0199
10 Ago-14 0.00697
Finally if you want to keep the header you can make something like this:
colnames(dat3) <- as.matrix(dat2[which(rowSelect)[1] - 1, ])
or
colnames(dat3) <- c('Month', as.character(dat2[which(rowSelect)[1] - 1, 2]))
dat3
Month Monthly return
5 Mar-14 0.0097
6 Apr-14 6e-04
7 May-14 0.0189
8 Jun-14 0.008
9 Jul-14 -0.0199
10 Ago-14 0.00697