Making matrix numeric and name orders

问题

I have the following data:

yvar <- c(1:150)
replication <- c( rep(c(rep(1, 10), rep(2,10), rep(3,10)),5))
genotypes <- c(rep(paste("G", 1:10, sep= ""), 15))
environments <- c(rep(paste("E",5:1, sep = ""), each = 30))
mydf1 <- data.frame (yvar, replication, genotypes, environments)
mydf1$replication <- as.factor(mydf1$replication)

I want to summarize data:

mydf = data.frame(aggregate (yvar ~ genotypes + environments, data = mydf1, mean))

Now create a matrix, hopefully numeric, matm is not !

matm = as.matrix(aggregate(yvar ~ genotypes, mydf, 'c'))
colnames(matm) <- c("genotypes", levels(mydf$environments))
      genotypes E1    E2    E3    E4    E5   
 [1,] "G1"      "131" "101" " 71" " 41" " 11"
 [2,] "G10"     "140" "110" " 80" " 50" " 20"
 [3,] "G2"      "132" "102" " 72" " 42" " 12"
 [4,] "G3"      "133" "103" " 73" " 43" " 13"
 [5,] "G4"      "134" "104" " 74" " 44" " 14"
 [6,] "G5"      "135" "105" " 75" " 45" " 15"
 [7,] "G6"      "136" "106" " 76" " 46" " 16"
 [8,] "G7"      "137" "107" " 77" " 47" " 17"
 [9,] "G8"      "138" "108" " 78" " 48" " 18"
[10,] "G9"      "139" "109" " 79" " 49" " 19"

I converted to data.frame, then

    matd <- data.frame(matm)

  genotypes       E1       E2       E3       E4       E5
1         G1 31.70000 26.76667 23.60000 30.73333 43.13333
2        G10 32.40000 17.86667 28.83333 32.43333 30.23333
3         G2 29.50000 24.60000 24.16667 33.43333 38.66667
4         G3 27.00000 28.83333 33.63333 43.83333 29.60000
5         G4 29.53333 29.90000 26.60000 26.13333 40.33333
6         G5 27.40000 32.43333 27.96667 40.43333 41.46667
7         G6 36.76667 32.26667 28.26667 38.73333 33.43333
8         G7 29.63333 27.00000 26.96667 34.90000 40.70000
9         G8 24.50000 23.26667 22.50000 27.60000 32.26667
10        G9 31.60000 24.96667 24.46667 27.56667 36.26667

I want to get rid of genotypes column and then convert it to matrix

matx = data.frame(matd[,-1])
matdm <- as.matrix(matx) 
matdm 
      E1         E2         E3         E4         E5        
 [1,] "31.70000" "26.76667" "23.60000" "30.73333" "43.13333"
 [2,] "32.40000" "17.86667" "28.83333" "32.43333" "30.23333"
 [3,] "29.50000" "24.60000" "24.16667" "33.43333" "38.66667"
 [4,] "27.00000" "28.83333" "33.63333" "43.83333" "29.60000"
 [5,] "29.53333" "29.90000" "26.60000" "26.13333" "40.33333"
 [6,] "27.40000" "32.43333" "27.96667" "40.43333" "41.46667"
 [7,] "36.76667" "32.26667" "28.26667" "38.73333" "33.43333"
 [8,] "29.63333" "27.00000" "26.96667" "34.90000" "40.70000"
 [9,] "24.50000" "23.26667" "22.50000" "27.60000" "32.26667"
[10,] "31.60000" "24.96667" "24.46667" "27.56667" "36.26667"

I have two questions:

(1) is there is consistent way to make / assign a matrix numeric

(2) I can see the genotypes column names are sorted alphabetically. My file has different order in the column. I am fine with this order if this is consistent, however I am afraid with the following portion:

colnames(matm) <- c("genotypes", levels(mydf$environments))

If there is different order of the aggregate function and levels(mydf$environments), do they both sort alphabettically or oder in file.

appreciate your suggestion.

回答1:

I think I see where the confusion is coming from. Backing up slightly, when you do the aggregation you want to turn into a matrix; try capturing that and looking at it:

myAgg <- aggregate(yvar ~ genotypes, mydf, 'c')
str(myAgg)

yields:

> str(myAgg)
'data.frame':   10 obs. of  2 variables:
 $ genotypes: Factor w/ 10 levels "G1","G10","G2",..: 1 2 3 4 5 6 7 8 9 10
 $ yvar     : num [1:10, 1:5] 131 140 132 133 134 135 136 137 138 139 ...

So the aggregate produces a somewhat atypical data.frame. The column yvar is actually the matrix you are interested in:

> myAgg$yvar
      [,1] [,2] [,3] [,4] [,5]
 [1,]  131  101   71   41   11
 [2,]  140  110   80   50   20
 [3,]  132  102   72   42   12
 [4,]  133  103   73   43   13
 [5,]  134  104   74   44   14
 [6,]  135  105   75   45   15
 [7,]  136  106   76   46   16
 [8,]  137  107   77   47   17
 [9,]  138  108   78   48   18
[10,]  139  109   79   49   19

so you can grab that directly:

matdm <- myAgg$yvar

Now to answer your specific questions...

1) the consistent way to make a matrix numeric is to ensure that data going into the matrix() or as.matrix() functions are numeric. When you called

matm = as.matrix(aggregate(yvar ~ genotypes, mydf, 'c'))

you created a character matrix because you had a char column. Then you converted that matrix into a data.frame. This converted the columns into factors. Then you selected a few columns which were, not surprisingly, still factors. So when you called

matdm <- as.matrix(matx)

the factors got converted to characters.

2) The order of the variables created by

aggregate(yvar ~ genotypes, mydf, 'c')

is a function of the order of the factors in the variable genotypes. Those are generally created alphabetically, but you can always look at the levels in order to be totally sure. If the factors were created manually they would not necessarily be in alphabetical order.

回答2:

This is a job for the reshape2 package. Here is the code

library(reshape2); library(plyr)
ans <- dcast(mydf1, genotypes ~ environments, mean, value_var = 'yvar')
ans <- mutate(ans, genotypes = sub("G", "", genotypes))
arrange(ans, genotypes)

来源：https://stackoverflow.com/questions/8052828/making-matrix-numeric-and-name-orders

标签

matrix

dataframe

numeric