问题
This may be a little obtuse of a question title since I'm still getting up to speed with R but I'm doing some data frame manipulation to extract certain percentages regarding classification groups that are captured by one column that is a factor against another column I wish to obtain percentages from. I'll use the built in mtcars to demonstrate what I'm trying to achieve, where gear is playing the role of the classification variable, and cyl is the data I'm trying to get percentages from.
Just some background details to smooth the question:
The gear
column spans 3 distinct values, 3,4,5
.
The cyl
column spans 3 distinct values as well, 4,6,8
The first element of my list says what percentage of gear types have at most 4 cylinders. For 3-gear models there is only one, the Toyota Corona, out of a total of 15 3-gear models, and thus the percentage should be 1/15 = 0.0667. For 4-gear models there are eight out of a total of 12 4-gear models, to yield 8/12 = 0.667.
Now here's the method I wrote to do this computation. However the structure of the output is not what I desire. What I'd like instead is to merge this all into a data frame with the first column being the distinct cyl values and the other columns being the 3, 4, and 5 for the gear types, where the rows are the various percentages. I'm very close but need some help doing the data reshaping of the list I am currently achieving or maybe even exercising an alternative apply function that will achieve the table of percentages I'm chasing after, or any other magic someone can cook up.
> lapply( unique( sort( y$cyl ) ) , function(c) { tapply( y$cyl , y$gear , function(x) sum( x <= c ) / length(x) ) } )
[[1]]
3 4 5
0.06666667 0.66666667 0.40000000
[[2]]
3 4 5
0.2 1.0 0.6
[[3]]
3 4 5
1 1 1
This is what we could expect the data frame I desire to appear as
cyl X3 X4 X5
1 4 0.06666667 0.6666667 0.4
2 6 0.20000000 1.0000000 0.6
3 8 1.00000000 1.0000000 1.0
回答1:
I came up with a solution after googling "convert list of arrays into data.frame", which immediately lead me to the following SO post.
p <- lapply( unique( sort( mtcars$cyl ) ) , function(c) { tapply( mtcars$cyl , mtcars$gear , function(x) sum( x <= c ) / length(x) ) } )
> df <- data.frame( matrix( unlist(p) , nrow = length(p) , byrow=T ) )
> df
X1 X2 X3
1 0.06666667 0.6666667 0.4
2 0.20000000 1.0000000 0.6
3 1.00000000 1.0000000 1.0
The solution works apart from the dropping of the classification names as the column headers, but it looks like with a follow up assignment this can be recovered as well...
> colnames(df) <- names(p[[1]])
> rownames(df) <- unique( sort( mtcars$cyl ) )
> df
3 4 5
4 0.06666667 0.6666667 0.4
6 0.20000000 1.0000000 0.6
8 1.00000000 1.0000000 1.0
Actually, other answers to the linked question nicely address the column headers issue, the row header problem remains since those values get lost in my anonymous function calls.
来源:https://stackoverflow.com/questions/26534438/intradataframe-analysis-creating-a-derivative-data-frame-from-another-data-fram