data.frame colmn mean at n-th interval of rows

问题

I tried to find similar questions here but thought that it might be helpful to just create new question.

I have relatively large dataset (daily weather data for several hundreds stations) I did some sliding window average calculation and I got the intermediate output like this.

data frame of 10481441 row by 11 column but my row names are consist of series of numbers with identifiable prefix.

For example,

X1.1
X1.2
X1.3
...
X1.11659
X2.1
X2.2
X2.3
...
X2.11659
X3.1
...
X899.11659

Then what I want to do is get mean values of 11 columns by same prefix of row numbers

For example:
for 11 columns mean(X1.1,X2.1,X3.1,...X899.1) (X1.2,X2.2,X3.2,...X899.2) and thus expect to have a output table as 899 set of 11659 row by 11 column

I got some help here to have jumping average at n-th row when I have as.Date row names but wasn't able to modify the desired format to run.

I appreciate your time.

回答1:

Add a new column with the desired category, and use that in aggregate. Here is an example:

d <- data.frame(x=1:12)
rownames(d) <- paste0('X', rep(1:4, 3), '.', rep(1:3, 4))
d
##       x
## X1.1  1
## X2.2  2
## X3.3  3
## X4.1  4
## X1.2  5
## X2.3  6
## X3.1  7
## X4.2  8
## X1.3  9
## X2.1 10
## X3.2 11
## X4.3 12

This looks something like your data. Now add the categories as a new column. If you split each name on ., you want the second entry returned:

d$category <- sapply(strsplit(rownames(d), '[.]'), '[', i=2)
d
##       x category
## X1.1  1        1
## X2.2  2        2
## X3.3  3        3
## X4.1  4        1
## X1.2  5        2
## X2.3  6        3
## X3.1  7        1
## X4.2  8        2
## X1.3  9        3
## X2.1 10        1
## X3.2 11        2
## X4.3 12        3

And aggregate based on this new column. . here means "all other columns" so you will get means of everything other than the new column:

aggregate(.~category, data=d, FUN=mean)
##   category   x
## 1        1 5.5
## 2        2 6.5
## 3        3 7.5

来源：https://stackoverflow.com/questions/21352553/data-frame-colmn-mean-at-n-th-interval-of-rows

标签

lapply

sapply