问题
I am using Linux bash version 4.1.2
I have a tab-delimited input_file having 5 fields and I want to calculate the MD5 for each line and put the md5sum at the end of each line.
The expected output_file should therefore has 6 fields for each line.
Here is my coding:
cat input_file | while read ONELINE
do
THEMD5=`echo "$ONELINE" | md5sum | awk '{print $1}'`
echo -e "${ONELINE}\t${THEMD5}"
done > output_file
The coding works well most of the time.
However, if ONELINE is ended with single/double tabs, the trailing tab(s) will disappear!
As a result, the output_file will sometimes contain lines of 4 or 5 fields, due to the missing tab(s).
I have tried to add IFS=
or IFS=''
or IFS=$'\n'
or IFS-$'\012'
in the while
statement, but still cannot solve the problem.
Please help.
Alvin SIU
回答1:
The following is quite certainly correct, if you want trailing newlines included in your md5sums (as your original code has):
while IFS= read -r line; do
read sum _ < <(printf '%s\n' "$line" | md5sum -)
printf '%s\t%s\n' "$line" "$sum"
done <input_file
Notes:
- Characters inside IFS are stripped by
read
; settingIFS=
is sufficient to prevent this effect. - Without the
-r
argument,read
also interprets backslash literals, stripping them. - Using
echo -e
is dangerous: It interprets escape sequences inside your line, rather than emitting them as literals. - Using all-uppercase variable names is bad form. See the relevant spec (particularly the fourth paragraph), keeping in mind that shell variables and environment variables share a namespace.
- Using
echo
in general is bad form when dealing with uncontrolled data (particularly including data which can contain backslash literals). See the relevant POSIX spec, particularly the APPLICATION USAGE and RATIONALE sections. - If you want to print the lines in a way that makes hidden characters visible, consider using
'%q\t%s\n'
instead of'%s\t%s\n'
as a format string.
来源:https://stackoverflow.com/questions/31955313/how-to-read-one-line-to-calculate-the-md5