问题
I have a shell script:
dir=$1
cd $dir
grep -P -o '(?<=<rating>).*' * |
awk -F: '{A[$1]+=$2;L[$1]++;next}END
{for(i in A){print i, A[i]/L[i]}}' | sort -nr -k2 |
awk '{ sub(/.dat/, " "); print }'
which sums up all of the numbers that follow the <rating>
field in each file of my folder but now I need to calculate the standard deviation of the numbers rather than getting the average. By summing up the difference of each rating in the file from the mean squared and then dividing this by the sample size -1. I do not need to do this in every file in the folder, but instead in 2 specific files, hotel_188937.dat
and hotel_203921.dat
. Here is an example of the contents of one of these files:
<Overall Rating>
<Avg. Price>$155
<URL>
<Author>Jeter5
<Content>I hope we're not disappointed! We enjoyed New Orleans...
<Date>Dec 19, 2008
<No. Reader>-1
<No. Helpful>-1
<rating>4
<Value>-1
<Rooms>3
<Location>5
<Cleanliness>3
<Check in / front desk>5
<Service>5
<Business service>5
<Author>...
repeat fields again...
The sample size of the first file is 127 with a mean of 4.78 compared with a sample size of 324 and a mean of 4.78 for the second file. Is there anyway that I can alter my script to calculate the standard deviation for these two specific files rather than calculating the average for every file in my directory? Thanks for your time.
回答1:
You can do all in one awk script
$ awk -F'>' '
$1=="<rating" {k=FILENAME;sub(/.dat/,"",k);
s[k]+=$2;ss[k]+=$2^2;c[k]++}
END{for(i in s)
print i,m=s[i]/c[i],sqrt(ss[i]/c[i]-m^2)}' r1.dat r2.dat
r1 2.5 1.11803
r2 3 1.41421
s is for sum, ss for square sum, c for count, m for mean. Note that this computes population standard deviation not sample standard deviation. For latter you need to do some scaling adjustments with (count-1).
回答2:
Yes.
The *
in the grep
line tells it to search in all the files.
Change the line
grep -P -o '(?<=<rating>).*' * |
to
grep -P -o '(?<=<rating>).*' hotel_188937.dat hotel_203921.dat |
来源:https://stackoverflow.com/questions/35628103/how-do-i-calculate-the-standard-deviation-in-my-shell-script