command line utility to print statistics of numbers in linux

后端 未结 16 1548
無奈伤痛
無奈伤痛 2020-11-30 18:46

I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth.

Is there a

相关标签:
16条回答
  • 2020-11-30 19:20

    There is also simple-r, which can do almost everything that R can, but with less keystrokes:

    https://code.google.com/p/simple-r/

    To calculate basic descriptive statistics, one would have to type one of:

    r summary file.txt
    r summary - < file.txt
    cat file.txt | r summary -
    

    For each of average, median, min, max and std deviation, the code would be:

    seq 1 100 | r mean - 
    seq 1 100 | r median -
    seq 1 100 | r min -
    seq 1 100 | r max -
    seq 1 100 | r sd -
    

    Doesn't get any simple-R!

    0 讨论(0)
  • 2020-11-30 19:21

    data_hacks is a Python command-line utility for basic statistics.

    The first example from that page produces the desired results:

    $ cat /tmp/data | histogram.py
    # NumSamples = 29; Max = 10.00; Min = 1.00
    # Mean = 4.379310; Variance = 5.131986; SD = 2.265389
    # each * represents a count of 1
        1.0000 -     1.9000 [     1]: *
        1.9000 -     2.8000 [     5]: *****
        2.8000 -     3.7000 [     8]: ********
        3.7000 -     4.6000 [     3]: ***
        4.6000 -     5.5000 [     4]: ****
        5.5000 -     6.4000 [     2]: **
        6.4000 -     7.3000 [     3]: ***
        7.3000 -     8.2000 [     1]: *
        8.2000 -     9.1000 [     1]: *
        9.1000 -    10.0000 [     1]: *
    
    0 讨论(0)
  • 2020-11-30 19:21

    Also, the self-write stats, (bundled with 'scut') a perl util to do just that. Fed a stream of numbers on STDIN, it tries to reject non-numbers and emits the following:

    $ ls -lR | scut -f=4 | stats
    Sum       3.10271e+07
    Number    452
    Mean      68643.9
    Median    4469.5
    Mode      4096
    NModes    6
    Min       2
    Max       1.01171e+07
    Range     1.01171e+07
    Variance  3.03828e+11
    Std_Dev   551206
    SEM       25926.6
    95% Conf  17827.9 to 119460
              (for a normal distribution - see skew)
    Skew      15.4631
              (skew = 0 for a symmetric dist)
    Std_Skew  134.212
    Kurtosis  258.477
              (K=3 for a normal dist)
    

    It can also do a number of transforms on the input stream and emit only the unadorned value if you ask it; ie 'stats --mean' will return the mean as an unlabelled float.

    0 讨论(0)
  • 2020-11-30 19:25

    Yet another tool: https://www.gnu.org/software/datamash/

    # Example: calculate the sum and mean of values 1 to 10:
    $ seq 10 | datamash sum 1 mean 1
    55 5.5
    

    Might be more commonly packaged (the first tool I found prepackaged for nix at least)

    0 讨论(0)
提交回复
热议问题