问题
I tried to find similar questions here but thought that it might be helpful to just create new question.
I have relatively large dataset (daily weather data for several hundreds stations) I did some sliding window average calculation and I got the intermediate output like this.
data frame of 10481441 row
by 11 column
but my row names are consist of series of numbers with identifiable prefix.
For example,
X1.1
X1.2
X1.3
...
X1.11659
X2.1
X2.2
X2.3
...
X2.11659
X3.1
...
X899.11659
Then what I want to do is get mean values of 11 columns
by same prefix of row numbers
For example:
for 11 columns
mean(X1.1,X2.1,X3.1,...X899.1) (X1.2,X2.2,X3.2,...X899.2)
and thus expect to have a output table as 899
set of 11659 row
by 11 column
I got some help here to have jumping average at n-th row when I have as.Date
row names but wasn't able to modify the desired format to run.
I appreciate your time.
回答1:
Add a new column with the desired category, and use that in aggregate
. Here is an example:
d <- data.frame(x=1:12)
rownames(d) <- paste0('X', rep(1:4, 3), '.', rep(1:3, 4))
d
## x
## X1.1 1
## X2.2 2
## X3.3 3
## X4.1 4
## X1.2 5
## X2.3 6
## X3.1 7
## X4.2 8
## X1.3 9
## X2.1 10
## X3.2 11
## X4.3 12
This looks something like your data. Now add the categories as a new column. If you split each name on .
, you want the second entry returned:
d$category <- sapply(strsplit(rownames(d), '[.]'), '[', i=2)
d
## x category
## X1.1 1 1
## X2.2 2 2
## X3.3 3 3
## X4.1 4 1
## X1.2 5 2
## X2.3 6 3
## X3.1 7 1
## X4.2 8 2
## X1.3 9 3
## X2.1 10 1
## X3.2 11 2
## X4.3 12 3
And aggregate based on this new column. .
here means "all other columns" so you will get means of everything other than the new column:
aggregate(.~category, data=d, FUN=mean)
## category x
## 1 1 5.5
## 2 2 6.5
## 3 3 7.5
来源:https://stackoverflow.com/questions/21352553/data-frame-colmn-mean-at-n-th-interval-of-rows