R - describe() output to a data frame

杀马特。学长 韩版系。学妹 提交于 2020-06-12 06:33:08

问题


I want to create a data frame using describe() function. Dataset under consideration is iris. The data frame should look like this:

    Variable    n   missing unique  Info    Mean    0.05    0.1   0.25  0.5    0.75 0.9   0.95
   Sepal.Length 150    0    35      1       5.843   4.6     4.8   5.1   5.8    6.4  6.9   7.255
   Sepal.Width  150    0    23      0.99    3.057   2.345   2.5   2.8   3      3.3  3.61  3.8
Petal.Length    150    0    43      1       3.758   1.3     1.4   1.6   4.35   5.1  5.8   6.1
 Petal.Width    150    0    22      0.99    1.199   0.2     0.2   0.3   1.3    1.8  2.2   2.3
     Species    150    0    3                                   

Is there a way out to coerce the output of describe() to data.frame type? When I try to coerce, I get an error as shown below:

library(Hmisc)
statistics <- describe(iris)
statistics[1]
first_vec <- statistics[1]$Sepal.Length
as.data.frame(first_vec)
#Error in as.data.frame.default(first_vec) : cannot coerce class ""describe"" to a data.frame

Thanks


回答1:


The way to figure this out is to examine the objects with str():

data(iris)
library(Hmisc)
di <- describe(iris)
di
# iris 
# 
# 5  Variables      150  Observations
# -------------------------------------------------------------
# Sepal.Length 
#       n missing  unique    Info    Mean     .05     .10     .25     .50     .75     .90     .95 
#     150       0      35       1   5.843   4.600   4.800   5.100   5.800   6.400   6.900   7.255
# 
# lowest : 4.3 4.4 4.5 4.6 4.7, highest: 7.3 7.4 7.6 7.7 7.9 
# -------------------------------------------------------------
# ...
# -------------------------------------------------------------
# Species 
#       n missing  unique 
#     150       0       3 
# 
# setosa (50, 33%), versicolor (50, 33%) 
# virginica (50, 33%) 
# -------------------------------------------------------------
str(di)
# List of 5
# $ Sepal.Length:List of 6
# ..$ descript    : chr "Sepal.Length"
# ..$ units       : NULL
# ..$ format      : NULL
# ..$ counts      : Named chr [1:12] "150" "0" "35" "1" ...
# .. ..- attr(*, "names")= chr [1:12] "n" "missing" "unique" "Info" ...
# ..$ intervalFreq:List of 2
# .. ..$ range: atomic [1:2] 4.3 7.9
# .. .. ..- attr(*, "Csingle")= logi TRUE
# .. ..$ count: int [1:100] 1 0 3 0 0 1 0 0 4 0 ...
# ..$ values      : Named chr [1:10] "4.3" "4.4" "4.5" "4.6" ...
# .. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
# ..- attr(*, "class")= chr "describe"
# $ Sepal.Width :List of 6
# ...
# $ Species     :List of 5
# ..$ descript: chr "Species"
# ..$ units   : NULL
# ..$ format  : NULL
# ..$ counts  : Named num [1:3] 150 0 3
# .. ..- attr(*, "names")= chr [1:3] "n" "missing" "unique"
# ..$ values  : num [1:2, 1:3] 50 33 50 33 50 33
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:2] "Frequency" "%"
# .. .. ..$ : chr [1:3] "setosa" "versicolor" "virginica"
# ..- attr(*, "class")= chr "describe"
# - attr(*, "descript")= chr "iris"
# - attr(*, "dimensions")= int [1:2] 150 5
# - attr(*, "class")= chr "describe"

We see that di is a list of lists. We can take it apart by looking at just the first sublist. You can convert that into a vector:

unlist(di[[1]])
#             descript              counts.n 
#       "Sepal.Length"                 "150" 
#       counts.missing         counts.unique 
#                  "0"                  "35" 
#          counts.Info           counts.Mean 
#                  "1"               "5.843" 
#           counts..05            counts..10 
#              "4.600"               "4.800" 
#           counts..25            counts..50 
#              "5.100"               "5.800" 
#           counts..75            counts..90 
#              "6.400"               "6.900" 
#           counts..95   intervalFreq.range1 
#              "7.255"                 "4.3" 
#  intervalFreq.range2   intervalFreq.count1 
#                "7.9"                   "1" 
#  ...
#            values.H3             values.H2 
#                "7.6"                 "7.7" 
#            values.H1 
#                 "7.9" 
str(unlist(di[[1]]))
# Named chr [1:125] "Sepal.Length" "150" "0" "35" ...
# - attr(*, "names")= chr [1:125] "descript" "counts.n" "counts.missing" "counts.unique" ...

It is very, very long (125). The elements have been coerced to all be of the same (and most inclusive) type, namely, character. It seems you want the 2nd through 12th elements:

unlist(di[[1]])[2:12]
#     counts.n counts.missing  counts.unique    counts.Info 
#        "150"            "0"           "35"            "1" 
#  counts.Mean     counts..05     counts..10     counts..25 
#      "5.843"        "4.600"        "4.800"        "5.100" 
#   counts..50     counts..75     counts..90 
#      "5.800"        "6.400"        "6.900" 

Now you have something you can start to work with. But notice that this only seems to be the case for numerical variables; the factor variable species is different:

unlist(di[[5]])
#     descript       counts.n counts.missing  counts.unique 
#    "Species"          "150"            "0"            "3" 
#      values1        values2        values3        values4 
#         "50"           "33"           "50"           "33" 
#      values5        values6 
#         "50"           "33" 

In that case, it seems you only want elements two through four.

Using this process of discovery and problem solving, you can see how you'd take the output of describe apart and put the information you want into a data frame. However, this will take a lot of work. You'll presumably need to use loops and lots of if(){ ... } else{ ... } blocks. You might just want to code your own dataset description function from scratch.




回答2:


You can do this by using the stat.desc function from the pastecs package:

library(pastecs)
summary_df <- stat.desc(mydata) 

The summary_df is the dataframe you wanted. See more info here.




回答3:


In R, you just have to use the summary(iris) function instead of describe(iris) function in Python.



来源:https://stackoverflow.com/questions/37908545/r-describe-output-to-a-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!