问题
I'm pulling data from a file (in this case an exim mail log) and often it saves characters in an escaped octal sequence like \NNN where 'N' represents an octal value 0-7. This mainly happens when the subject is written in non-Latin characters (Arabic for example).
My goal is to find the cleanest way to convert these octal characters to display correctly in my utf-8 enabled terminal, specifically in 'less' as there is the potential for lots of output.
The best approach I have found so far is as follows:
arbitrary_stream | { while read -r temp; do printf %b "$temp\n"; done } | less
This seems to work pretty well, however I would assume that there is some translator tool, or maybe even a flag built into 'less' to handle this. I also found that if you use something like sed to inject a 0 after each \, you can store it as a variable, then use 'echo -e $data' however this was more messy than the previous solution.
Test case:
octalvar="\342\202\254"
expected output in less:
€
I'm looking for something cleaner, more complete or just better than my above solution in the form of either:
echo $octalvar | do_something | less
or
echo $octalvar | less --some_magic_flag
Any suggestions? Or is my solution about as clean as I can expect?
回答1:
Conversion in GNU awk (for using strtonum
). It proved out to be a hassle so the code is a mess and maybe could be streamlined, feel free to advice:
awk '{
while(match($0,/\\[0-8]{3}/)) { # search for \NNNs
o=substr($0,RSTART,RLENGTH) # extract it
sub(/\\/,"0",o) # replace \ with 0 for strtonum
c=sprintf("%c",strtonum(o)) # convert to a character
sub(/\\[0-8]{3}/,c) # replace the \NNN with the char
}
}1' foo > bar
or paste the code between single quotes to a file above_program.awk
and run it like awk -f above_program.awk foo > bar
. Test file foo
:
test 123 \342\202\254
Run it in a non-UTF8 locale, I used locale C:
$ locale
...
LC_ALL=C
$ awk -f above_program.awk foo
test 123 €
If you run it a UTF8 locale, conversion will happen:
$ locale
...
LC_ALL=en_US.utf8
$ awk -f above_program.awk foo
test 123 â¬
回答2:
This is my current version:
echo $arbitrary | { IFS=$'\n'; while read -r temp; do printf %b "$temp\n"; done; unset IFS; } | iconv -f utf-8 -t utf-8 -c | less
来源:https://stackoverflow.com/questions/43461003/unix-how-to-convert-octal-escape-sequences-via-pipe