pandas describe by - additional parameters

后端 未结 3 774
既然无缘
既然无缘 2021-02-06 11:20

I see that the pandas library has a Describe by function which returns some useful statistics. However, is there a way to add additional rows to the output such as

相关标签:
3条回答
  • 2021-02-06 11:38

    The answer from piRSquared makes the most sense to me, but I get a deprecation warning about reindex_axis in Python 3.5. This works for me:

        stats = data.describe()
        stats.loc['IQR'] = stats.loc['75%'] - stats.loc['25%'] # appending interquartile range instead of recalculating it
        stats = stats.append(data.reindex(stats.columns, axis=1).agg(['skew', 'mad', 'kurt']))
    
    0 讨论(0)
  • 2021-02-06 11:39

    Try this:

     df.describe()
    
          num1  num2
    count   3.0   3.0
    mean    2.0   5.0
    std     1.0   1.0
    min     1.0   4.0
    25%     1.5   4.5
    50%     2.0   5.0
    75%     2.5   5.5
    max     3.0   6.0
    

    Build a second DataFrame.

     pd.DataFrame(df.mad() , columns = ["Mad"] ).T
    
             num1      num2
    Mad  0.666667  0.666667
    

    Join the two DataFrames.

     pd.concat([df.describe(),pd.DataFrame(df.mad() , columns = ["Mad"] ).T ])
    
              num1      num2
    count  3.000000  3.000000
    mean   2.000000  5.000000
    std    1.000000  1.000000
    min    1.000000  4.000000
    25%    1.500000  4.500000
    50%    2.000000  5.000000
    75%    2.500000  5.500000
    max    3.000000  6.000000
    Mad    0.666667  0.666667
    

    0 讨论(0)
  • 2021-02-06 11:50

    the default describe looks like this:

    np.random.seed([3,1415])
    df = pd.DataFrame(np.random.rand(100, 5), columns=list('ABCDE'))
    
    df.describe()
    
                    A           B           C           D           E
    count  100.000000  100.000000  100.000000  100.000000  100.000000
    mean     0.495871    0.472939    0.455570    0.503899    0.451341
    std      0.303589    0.291968    0.294984    0.269936    0.284666
    min      0.006453    0.001559    0.001068    0.015311    0.009526
    25%      0.239379    0.219141    0.196251    0.294371    0.202956
    50%      0.529596    0.456548    0.376558    0.532002    0.432936
    75%      0.759452    0.739666    0.665563    0.730702    0.686793
    max      0.999799    0.994510    0.997271    0.981551    0.979221
    

    Updated for pandas 0.20
    I'd make my own describe like below. It should be obvious how to add more.

    def describe(df, stats):
        d = df.describe()
        return d.append(df.reindex_axis(d.columns, 1).agg(stats))
    
    describe(df, ['skew', 'mad', 'kurt'])
    
                    A           B           C           D           E
    count  100.000000  100.000000  100.000000  100.000000  100.000000
    mean     0.495871    0.472939    0.455570    0.503899    0.451341
    std      0.303589    0.291968    0.294984    0.269936    0.284666
    min      0.006453    0.001559    0.001068    0.015311    0.009526
    25%      0.239379    0.219141    0.196251    0.294371    0.202956
    50%      0.529596    0.456548    0.376558    0.532002    0.432936
    75%      0.759452    0.739666    0.665563    0.730702    0.686793
    max      0.999799    0.994510    0.997271    0.981551    0.979221
    skew    -0.014942    0.048054    0.247244   -0.125151    0.066156
    mad      0.267730    0.249968    0.254351    0.228558    0.242874
    kurt    -1.323469   -1.223123   -1.095713   -1.083420   -1.148642
    

    Old Answer

    def describe(df):
        return pd.concat([df.describe().T,
                          df.mad().rename('mad'),
                          df.skew().rename('skew'),
                          df.kurt().rename('kurt'),
                         ], axis=1).T
    
    describe(df)
    
                    A           B           C           D           E
    count  100.000000  100.000000  100.000000  100.000000  100.000000
    mean     0.495871    0.472939    0.455570    0.503899    0.451341
    std      0.303589    0.291968    0.294984    0.269936    0.284666
    min      0.006453    0.001559    0.001068    0.015311    0.009526
    25%      0.239379    0.219141    0.196251    0.294371    0.202956
    50%      0.529596    0.456548    0.376558    0.532002    0.432936
    75%      0.759452    0.739666    0.665563    0.730702    0.686793
    max      0.999799    0.994510    0.997271    0.981551    0.979221
    mad      0.267730    0.249968    0.254351    0.228558    0.242874
    skew    -0.014942    0.048054    0.247244   -0.125151    0.066156
    kurt    -1.323469   -1.223123   -1.095713   -1.083420   -1.148642
    
    0 讨论(0)
提交回复
热议问题