I am trying to read a file line by line and find the average of the numbers in each line. I am getting the error: expr: non-numeric argument
I have narrowe
Others have already pointed out that expr
is integer-only, and recommended writing your script in awk instead of shell.
Your system may have a number of tools on it that support arbitrary-precision math, or floats. Two common calculators in shell are bc
which follows standard "order of operations", and dc
which uses "reverse polish notation".
Either one of these can easily be fed your data such that per-line averages can be produced. For example, using bc:
#!/bin/sh
while read line; do
set - ${line}
c=$#
string=""
for n in $*; do
string+="${string:++}$1"
shift
done
average=$(printf 'scale=4\n(%s) / %d\n' $string $c | bc)
printf "%s // avg=%s\n" "$line" "$average"
done
Of course, the only bc
-specific part of this is the format for the notation and the bc
itself in the third last line. The same basic thing using dc
might look like like this:
#!/bin/sh
while read line; do
set - ${line}
c=$#
string="0"
for n in $*; do
string+=" $1 + "
shift
done
average=$(dc -e "4k $string $c / p")
printf "%s // %s\n" "$line" "$average"
done
Note that my shell supports appending to strings with +=
. If yours does not, you can adjust this as you see fit.
In both of these examples, we're printing our output to four decimal places -- with scale=4
in bc, or 4k
in dc. We are processing standard input, so if you named these scripts "calc", you might run them with command lines like:
$ ./calc < inputfile.txt
The set
command at the beginning of the loop turns the $line
variable into positional parameters, like $1
, $2
, etc. We then process each positional parameter in the for
loop, appending everything to a string which will later get fed to the calculator.
Also, you can fake it.
That is, while bash doesn't support floating point numbers, it DOES support multiplication and string manipulation. The following uses NO external tools, yet appears to present decimal averages of your input.
#!/bin/bash
declare -i total
while read line; do
set - ${line}
c=$#
total=0
for n in $*; do
total+="$1"
shift
done
# Move the decimal point over prior to our division...
average=$(($total * 1000 / $c))
# Re-insert the decimal point via string manipulation
average="${average:0:$((${#average} - 3))}.${average:$((${#average} - 3))}"
printf "%s // %0.3f\n" "$line" "$average"
done
The important bits here are:
* declare
which tells bash to add to $total
with +=
rather than appending it as if it were a string,
* the two average=
assignments, the first of which multiplies $total
by 1000, and the second of which splits the result at the thousands column, and
* printf
whose format enforces three decimal places of precision in its output.
Of course, input still needs to be integers.
YMMV. I'm not saying this is how you should solve this, just that it's an option. :)
This is a pretty old post, but came up at the top my Google search, so thought I'd share what I came up with:
while read line; do
# Convert each line to an array
ARR=( $line )
# Append each value in the array with a '+' and calculate the sum
# (this causes the last value to have a trailing '+', so it is added to '0')
ARR_SUM=$( echo "${ARR[@]/%/+} 0" | bc -l)
# Divide the sum by the total number of elements in the array
echo "$(( ${ARR_SUM} / ${#ARR[@]} ))"
done < "$filename"
With some minor corrections, your code runs well:
while read -a rows
do
total=0
sum=0
for i in "${rows[@]}"
do
sum=`expr $sum + $i`
total=`expr $total + 1`
done
average=`expr $sum / $total`
echo $average
done <filename
With the sample input file, the output produced is:
1
5
7
5
2
5
Note that the answers are what they are because expr
only does integer arithmetic.
The above code could be rewritten as:
$ while read row; do expr '(' $(sed 's/ */ + /g' <<<"$row") ')' / $(wc -w<<<$row); done < filename
1
5
7
5
2
5
expr
is archaic. In modern bash:
while read -a rows
do
total=0
sum=0
for i in "${rows[@]}"
do
((sum += $i))
((total++))
done
echo $((sum/total))
done <filename
Because awk does floating point math, it can provide more accurate results:
$ awk '{s=0; for (i=1;i<=NF;i++)s+=$i; print s/NF;}' filename
1
5.2
7.4
5.4
2.8
5.6
Some variations on the same trick of using the IFS variable.
#!/bin/bash
while read line; do
set -- $line
echo $(( ( $(IFS=+; echo "$*") ) / $# ))
done < rows
echo
while read -a line; do
echo $(( ( $(IFS=+; echo "${line[*]}") ) / ${#line[*]} ))
done < rows
echo
saved_ifs="$IFS"
while read -a line; do
IFS=+
echo $(( ( ${line[*]} ) / ${#line[*]} ))
IFS="$saved_ifs"
done < rows