How to count number of Numeric values in a column

后端 未结 5 2002
轻奢々
轻奢々 2021-02-06 05:24

I have a dataframe, and I want to produce a table of summary statistics including number of valid numeric values, mean and sd by group for each of three columns. I can\'t seem

相关标签:
5条回答
  • 2021-02-06 05:36

    colSums(!is.na(x)) should work.

    0 讨论(0)
  • 2021-02-06 05:40

    These are a few add-on packages that might help (see Quick-R)

    Using the Hmisc package

    library(Hmisc)
    
    describe(mydata) 
    # n, nmiss, unique, mean, 5,10,25,50,75,90,95th percentiles 
    # 5 lowest and 5 highest scores
    

    Using the pastecs package

    library(pastecs)
    
    stat.desc(mydata) 
    # nbr.val, nbr.null, nbr.na, min max, range, sum, 
    # median, mean, SE.mean, CI.mean, var, std.dev, coef.var 
    

    Using the psych package

    library(psych)
    describe(mydata)
    # item name ,item number, nvalid, mean, sd, 
    # median, mad, min, max, skew, kurtosis, se
    

    I'd use describe.by from the psych package;

    > describe.by(biastable, as.factor(Nominal))
    group: 1
             var n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
    Nominal    1 9 1.00 0.00   1.00    1.00 0.00 1.00 1.00  0.00   NaN      NaN 0.00
    Actual     2 8 0.12 0.01   0.12    0.12 0.01 0.11 0.13  0.03  0.09    -1.47 0.00
    LinPred    3 8 0.99 0.08   0.98    0.99 0.10 0.89 1.09  0.20  0.04    -1.70 0.03
    QuadPred   4 8 0.99 0.08   0.99    0.99 0.10 0.88 1.09  0.20 -0.04    -1.64 0.03
    ------------------------------------------------------------------------ 
    group: 3
             var n mean   sd median trimmed  mad  min  max range skew kurtosis   se
    Nominal    1 9 3.00 0.00   3.00    3.00 0.00 3.00 3.00  0.00  NaN      NaN 0.00
    Actual     2 9 0.37 0.03   0.36    0.37 0.03 0.32 0.42  0.10 0.15    -1.50 0.01
    LinPred    3 9 3.12 0.24   3.05    3.12 0.30 2.79 3.50  0.71 0.15    -1.52 0.08
    QuadPred   4 9 3.10 0.23   3.06    3.10 0.34 2.79 3.46  0.67 0.12    -1.51 0.08
    ------------------------------------------------------------------------ 
    group: 6
             var n mean   sd median trimmed  mad  min  max range skew kurtosis   se
    Nominal    1 9 6.00 0.00   6.00    6.00 0.00 6.00 6.00  0.00  NaN      NaN 0.00
    Actual     2 9 0.71 0.04   0.70    0.71 0.04 0.66 0.78  0.12 0.46    -1.30 0.01
    LinPred    3 9 6.02 0.30   5.91    6.02 0.28 5.61 6.47  0.86 0.28    -1.43 0.10
    QuadPred   4 9 5.99 0.31   5.93    5.99 0.25 5.55 6.49  0.94 0.26    -1.26 0.10
    ------------------------------------------------------------------------ 
    group: 10
             var n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
    Nominal    1 9 10.00 0.00  10.00   10.00 0.00 10.00 10.00  0.00  NaN      NaN 0.00
    Actual     2 9  1.16 0.07   1.14    1.16 0.09  1.06  1.25  0.19 0.09    -1.71 0.02
    LinPred    3 9  9.85 0.60   9.76    9.85 0.74  9.16 10.72  1.56 0.24    -1.76 0.20
    QuadPred   4 9  9.79 0.62   9.63    9.79 0.72  9.05 10.78  1.72 0.27    -1.65 0.21
    ------------------------------------------------------------------------ 
    group: 30
             var n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
    Nominal    1 9 30.00 0.00  30.00   30.00 0.00 30.00 30.00  0.00  NaN      NaN 0.00
    Actual     2 9  3.53 0.22   3.51    3.53 0.21  3.25  3.85  0.60 0.23    -1.58 0.07
    LinPred    3 9 30.08 1.55  29.88   30.08 1.44 27.70 32.66  4.96 0.21    -1.27 0.52
    QuadPred   4 9 29.92 1.51  30.00   29.92 1.44 27.44 32.38  4.94 0.04    -1.22 0.50
    ------------------------------------------------------------------------ 
    group: 50
             var n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
    Nominal    1 9 50.00 0.00  50.00   50.00 0.00 50.00 50.00  0.00  NaN      NaN 0.00
    Actual     2 9  5.91 0.51   5.82    5.91 0.43  5.43  6.94  1.51 0.90    -0.73 0.17
    LinPred    3 9 50.40 3.98  48.77   50.40 3.21 44.89 57.37 12.48 0.49    -1.16 1.33
    QuadPred   4 9 50.24 3.97  48.91   50.24 2.65 44.49 57.01 12.52 0.39    -1.21 1.32
    ------------------------------------------------------------------------ 
    group: 150
             var n   mean   sd median trimmed   mad    min    max range  skew kurtosis   se
    Nominal    1 9 150.00 0.00 150.00  150.00  0.00 150.00 150.00  0.00   NaN      NaN 0.00
    Actual     2 6  17.23 0.97  17.20   17.23  0.67  15.90  18.80  2.90  0.25    -1.23 0.39
    LinPred    3 6 147.19 8.11 147.01  147.19 11.13 138.04 155.39 17.36 -0.01    -2.22 3.31
    QuadPred   4 6 147.77 7.95 147.48  147.77 10.95 139.60 157.78 18.17  0.07    -2.10 3.25
    ------------------------------------------------------------------------ 
    group: 250
             var n   mean    sd median trimmed  mad    min    max range skew kurtosis   se
    Nominal    1 9 250.00  0.00 250.00  250.00 0.00 250.00 250.00  0.00  NaN      NaN 0.00
    Actual     2 9  28.83  1.18  28.70   28.83 0.89  27.10  31.20  4.10 0.59    -0.57 0.39
    LinPred    3 9 246.29 10.57 245.98  246.29 9.31 231.46 264.81 33.35 0.33    -1.26 3.52
    QuadPred   4 9 251.51  8.84 248.45  251.51 5.08 240.41 268.30 27.89 0.62    -1.04 2.95
    > 
    
    0 讨论(0)
  • 2021-02-06 05:47

    What are "blank values" and "text values"? If you have numeric vector then you could have NA's (is.na()), Inf's (is.infinite()), NaN's (is.nan()) and "valid" numeric values.

    For "valid" numeric values (in the sense above) you could use is.finite():

    is.finite(c(1,NA,Inf,NaN))
    # [1]  TRUE FALSE FALSE FALSE
    sum( is.finite(c(1,NA,Inf,NaN)) )
    # [1] 1
    

    So colSums(is.numeric(x)) could be done like colSums(is.finite(x)).

    0 讨论(0)
  • 2021-02-06 05:50

    Can you use something like this?

    length(unique(x))
    
    0 讨论(0)
  • 2021-02-06 05:53

    Does complete.cases (or sum(complete.cases)) do what you want?

    0 讨论(0)
提交回复
热议问题