问题
I have the following data:
yvar <- c(1:150)
replication <- c( rep(c(rep(1, 10), rep(2,10), rep(3,10)),5))
genotypes <- c(rep(paste("G", 1:10, sep= ""), 15))
environments <- c(rep(paste("E",5:1, sep = ""), each = 30))
mydf1 <- data.frame (yvar, replication, genotypes, environments)
mydf1$replication <- as.factor(mydf1$replication)
I want to summarize data:
mydf = data.frame(aggregate (yvar ~ genotypes + environments, data = mydf1, mean))
Now create a matrix, hopefully numeric, matm is not !
matm = as.matrix(aggregate(yvar ~ genotypes, mydf, 'c'))
colnames(matm) <- c("genotypes", levels(mydf$environments))
genotypes E1 E2 E3 E4 E5
[1,] "G1" "131" "101" " 71" " 41" " 11"
[2,] "G10" "140" "110" " 80" " 50" " 20"
[3,] "G2" "132" "102" " 72" " 42" " 12"
[4,] "G3" "133" "103" " 73" " 43" " 13"
[5,] "G4" "134" "104" " 74" " 44" " 14"
[6,] "G5" "135" "105" " 75" " 45" " 15"
[7,] "G6" "136" "106" " 76" " 46" " 16"
[8,] "G7" "137" "107" " 77" " 47" " 17"
[9,] "G8" "138" "108" " 78" " 48" " 18"
[10,] "G9" "139" "109" " 79" " 49" " 19"
I converted to data.frame, then
matd <- data.frame(matm)
genotypes E1 E2 E3 E4 E5
1 G1 31.70000 26.76667 23.60000 30.73333 43.13333
2 G10 32.40000 17.86667 28.83333 32.43333 30.23333
3 G2 29.50000 24.60000 24.16667 33.43333 38.66667
4 G3 27.00000 28.83333 33.63333 43.83333 29.60000
5 G4 29.53333 29.90000 26.60000 26.13333 40.33333
6 G5 27.40000 32.43333 27.96667 40.43333 41.46667
7 G6 36.76667 32.26667 28.26667 38.73333 33.43333
8 G7 29.63333 27.00000 26.96667 34.90000 40.70000
9 G8 24.50000 23.26667 22.50000 27.60000 32.26667
10 G9 31.60000 24.96667 24.46667 27.56667 36.26667
I want to get rid of genotypes column and then convert it to matrix
matx = data.frame(matd[,-1])
matdm <- as.matrix(matx)
matdm
E1 E2 E3 E4 E5
[1,] "31.70000" "26.76667" "23.60000" "30.73333" "43.13333"
[2,] "32.40000" "17.86667" "28.83333" "32.43333" "30.23333"
[3,] "29.50000" "24.60000" "24.16667" "33.43333" "38.66667"
[4,] "27.00000" "28.83333" "33.63333" "43.83333" "29.60000"
[5,] "29.53333" "29.90000" "26.60000" "26.13333" "40.33333"
[6,] "27.40000" "32.43333" "27.96667" "40.43333" "41.46667"
[7,] "36.76667" "32.26667" "28.26667" "38.73333" "33.43333"
[8,] "29.63333" "27.00000" "26.96667" "34.90000" "40.70000"
[9,] "24.50000" "23.26667" "22.50000" "27.60000" "32.26667"
[10,] "31.60000" "24.96667" "24.46667" "27.56667" "36.26667"
I have two questions:
(1) is there is consistent way to make / assign a matrix numeric
(2) I can see the genotypes column names are sorted alphabetically. My file has different order in the column. I am fine with this order if this is consistent, however I am afraid with the following portion:
colnames(matm) <- c("genotypes", levels(mydf$environments))
If there is different order of the aggregate function and levels(mydf$environments),
do they both sort alphabettically or oder in file.
appreciate your suggestion.
回答1:
I think I see where the confusion is coming from. Backing up slightly, when you do the aggregation you want to turn into a matrix; try capturing that and looking at it:
myAgg <- aggregate(yvar ~ genotypes, mydf, 'c')
str(myAgg)
yields:
> str(myAgg)
'data.frame': 10 obs. of 2 variables:
$ genotypes: Factor w/ 10 levels "G1","G10","G2",..: 1 2 3 4 5 6 7 8 9 10
$ yvar : num [1:10, 1:5] 131 140 132 133 134 135 136 137 138 139 ...
So the aggregate produces a somewhat atypical data.frame. The column yvar
is actually the matrix you are interested in:
> myAgg$yvar
[,1] [,2] [,3] [,4] [,5]
[1,] 131 101 71 41 11
[2,] 140 110 80 50 20
[3,] 132 102 72 42 12
[4,] 133 103 73 43 13
[5,] 134 104 74 44 14
[6,] 135 105 75 45 15
[7,] 136 106 76 46 16
[8,] 137 107 77 47 17
[9,] 138 108 78 48 18
[10,] 139 109 79 49 19
so you can grab that directly:
matdm <- myAgg$yvar
Now to answer your specific questions...
1) the consistent way to make a matrix numeric is to ensure that data going into the matrix()
or as.matrix()
functions are numeric. When you called
matm = as.matrix(aggregate(yvar ~ genotypes, mydf, 'c'))
you created a character matrix because you had a char column. Then you converted that matrix into a data.frame. This converted the columns into factors. Then you selected a few columns which were, not surprisingly, still factors. So when you called
matdm <- as.matrix(matx)
the factors got converted to characters.
2) The order of the variables created by
aggregate(yvar ~ genotypes, mydf, 'c')
is a function of the order of the factors in the variable genotypes
. Those are generally created alphabetically, but you can always look at the levels in order to be totally sure. If the factors were created manually they would not necessarily be in alphabetical order.
回答2:
This is a job for the reshape2
package. Here is the code
library(reshape2); library(plyr)
ans <- dcast(mydf1, genotypes ~ environments, mean, value_var = 'yvar')
ans <- mutate(ans, genotypes = sub("G", "", genotypes))
arrange(ans, genotypes)
来源:https://stackoverflow.com/questions/8052828/making-matrix-numeric-and-name-orders