问题
I am doing a data analysis on wall thickness measurements of circular tubes. I have the following matrix:
> head(datIn, 12)
Component Tube.number Measurement.location Sub.location Interval Unit Start
1 In 1 1 A 121 U6100 7/25/2000
2 In 1 1 A 122 U6100 5/24/2001
3 In 1 1 A 222 U6200 1/19/2001
4 In 1 1 A 321 U6300 6/1/2000
5 In 1 1 A 223 U6200 5/22/2002
6 In 1 1 A 323 U6300 6/18/2002
7 In 1 1 A 21 U6200 10/1/1997
8 In 1 1 A 221 U6200 6/3/2000
9 In 1 1 A 322 U6300 12/11/2000
10 In 1 1 B 122 U6100 5/24/2001
11 In 1 1 B 322 U6300 12/11/2000
12 In 1 1 B 21 U6200 10/1/1997
End Measurement Material.loss Material.loss.interval Run.hours.interval
1 5/11/2001 7.6 0.4 NA 6653.10
2 2/7/2004 6.1 1.9 1.5 15484.82
3 3/7/2002 8.5 -0.5 -0.5 8826.50
4 12/1/2000 7.8 0.2 0.2 4170.15
5 4/30/2003 7.4 0.6 1.1 6879.73
6 9/30/2003 7.9 0.1 -0.1 9711.56
7 4/20/2000 7.6 0.4 NA 15159.94
8 1/5/2001 8.0 0.0 -0.4 4728.88
9 5/30/2002 7.8 0.2 0.0 9829.75
10 2/7/2004 5.9 2.1 0.9 15484.82
11 5/30/2002 7.0 1.0 0.7 9829.75
12 4/20/2000 8.2 -0.2 NA 15159.94
Run.hours.prior.to.interval Total.run.hours.end.interval
1 0.00 6653.10
2 6653.10 22137.92
3 19888.82 28715.32
4 0.00 4170.15
5 28715.32 35595.05
6 30039.58 39751.14
7 0.00 15159.94
8 15159.94 19888.82
9 20209.83 30039.58
10 6653.10 22137.92
11 20209.83 30039.58
12 0.00 15159.94
Straight.or.In.Out.Middle.bend.1 Straight.or.In.Out.Middle.bend.2
1 Out Out
2 Out Out
3 Out Out
4 Out Out
5 Out Out
6 Out Out
7 Out Out
8 Out Out
9 Out Out
10 Middle Out
11 Middle Out
12 Middle Out
The Sub.location column has values A, B, C, D. They are measurements at the same measurement location but at a different position in the cross section. So at 0, 90, 180, 270 degrees along the tube.
I would like to make a plot in which it becomes clear which measurement location has the biggest wall thickness decrease in time.
To do this I first want to calculate the mean value of the wall thickness of a tube at each measurement location at each unique interval (the running hours are coupled to the interval).
I tried doing this with the following formula:
par(mfrow=c(1,2))
myfunction <- function(mydata1) { return(mean(mydata1,na.rm=TRUE))}
AVmeasloc <- tapply(datIn$Measurement,list(as.factor(datIn$Sub.location),as.factor(datIn$Measurement.location), myfunction))
AVmeasloc
This doesnt seem to work. I would like to keep the tapply function as I also calculated the standard deviation for some values with this and it lets me make plots easily.
Does anyone have any advice how to tackle this problem?
回答1:
From the code you've post, there is a parenthesis error around list(), it should read
AVmeasloc <- tapply(datIn$Measurement,list(as.factor(datIn$Sub.location),as.factor(datIn$Measurement.location)), myfunction)
This can now be cleaned up to
AVmeasloc <- tapply(datIn$Measurement,datIn[,c(3,4)],mean,na.rm=TRUE)
Here's a working example:
test.data <- data.frame(cat1 = c("A","A","A","B","B","B","C","C","D"),
cat2 = c(1,1,2,2,1,NA,2,1,1),
val = c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9))
tapply(test.data$val, test.data[,c(1,2)],mean,na.rm=TRUE)
cat2
cat1 1 2
A 0.15 0.3
B 0.50 0.4
C 0.80 0.7
D 0.90 NA
来源:https://stackoverflow.com/questions/19680532/calculate-mean-value-of-sets-of-4-sub-locations-from-multiple-location-from-a-la