I am cleaning several excel files in R. They unfortunately are of unequal dimensions, rows and columns. Currently I am storing each excel sheet as a data frame in a list. I
My suggestion is to write a function that does what you want on a single data frame:
myfun <- function(dat) {
return(dat[4, , drop=FALSE])
}
If you want to return as a vector instead of data.frame
, just do: return(dat[4, ])
insteaad. Then use lapply
to apply that function to each element of your list:
lapply(df.list1, myfun)
With that technique, you can easily come up with ways to extend myfun
to more complex functions...
You could also just directly lapply
the extraction function @Justin suggests, e.g.:
# example data of a list containing 10 data frames:
test <- replicate(10,data.frame(a=1:10),simplify=FALSE)
# extract the fourth row of each one - setting drop=FALSE means you get a
# data frame returned even if only one vector/column needs to be returned.
lapply(test,"[",4,,drop=FALSE)
The format is:
lapply(listname,"[",rows.to.return,cols.to.return,drop=FALSE)
# the example returns the fourth row only from each data frame
#[[1]]
# a
#4 4
#
#[[2]]
# a
#4 4
# etc...
To generalise this when you are completing an extraction based on a condition, you would have to change it up a little to something like the below example extracting all rows where a
in each data.frame
is >4
. In this case, using an anonymous function is probably the clearest method, e.g.:
lapply(test, function(x) with(x,x[a>4,,drop=FALSE]) )
#[[1]]
# a
#5 5
#6 6
#7 7
#8 8
#9 9
#10 10
# etc...
For example, you have a .csv file called hw1_data.csv and you want to retrieve the 47th row. Here is how to do that:
x<-read.csv("hw1_data.csv")
x[47,]
If it is a text file you can use read.table
.
There is no need for a wrapper function, just use lapply
and pass it a blank argument at the end (to represent the columns)
lapply(df.list, `[`, 4, )
This also works with any type of row argument that you would normally use in myDF[ . , ]
eg: lapply(df.list,
[, c(2, 4:6), )
.
I would suggest that if you are going to use a wrapper function, have it work more like [
does: eg
Grab(df.list, 2:3, 1:5)
would select the second & third row and first through 5th column of every data.frame and
Grab (df.list, 2:3)
would select the second & third row of all columns
Grab <- function(ll, rows, cols) {
if (missing(cols))
lapply(ll, `[`, rows, )
else
lapply(ll, `[`, rows, cols)
}
Grab (df.list, 2:3)