I want printf to recognize multi-byte characters when calculating the field width so that columns line up properly... I can\'t find an answer to this problem and was wonderi
Are these the only way? There's no way to do it with printf
alone?
Well with the example from ninjalj (thx btw), I wrote a script to deal with this problem, and saved it as fprintf
in /usr/local/bin
:
#! /bin/bash
IFS=' '
declare -a Text=("${@}")
## Skip the whole thing if there are no multi-byte characters ##
if (( $(echo "${Text[*]}" | wc -c) > $(echo "${Text[*]}" | wc -m) )); then
if echo "${Text[*]}" | grep -Eq '%[#0 +-]?[0-9]+(\.[0-9]+)?[sb]'; then
IFS=$'\n'
declare -a FormatStrings=($(echo -n "${Text[0]}" | grep -Eo '%[^%]*?[bs]'))
IFS=$' \t\n'
declare -i format=0
## Check every format string ##
for fw in "${FormatStrings[@]}"; do
(( format++ ))
if [[ "$fw" =~ ^%[#0\ +-]?[1-9][0-9]*(\.[1-9][0-9]*)?[sb]$ ]]; then
(( Difference = $(echo "${Text[format]}" | wc -c) - $(echo "${Text[format]}" | wc -m) ))
## If multi-btye characters ##
if (( Difference > 0 )); then
## If a field width is entered then replace field width value ##
if [[ "$fw" =~ ^%[#0\ +-]?[1-9][0-9]* ]]; then
(( Width = $(echo -n "$fw" | gsed -re 's|^%[#0 +-]?([1-9][0-9]*).*[bs]|\1|') + Difference ))
declare -a Text[0]="$(echo -n "${Text[0]}" | gsed -rne '1h;1!H;${g;y|\n|\x1C|;s|(%[^%])|\n\1|g;p}' | gsed -rne $(( format + 1 ))'s|^(%[#0 +-]?)[1-9][0-9]*|\1'${Width}'|;1h;1!H;${g;s|\n||g;y|\x1C|\n|;p}')"
fi
## If a precision is entered then replace precision value ##
if [[ "$fw" =~ \.[1-9][0-9]*[sb]$ ]]; then
(( Precision = $(echo -n "$fw" | gsed -re 's|^%.*\.([1-9][0-9]*)[sb]$|\1|') + Difference ))
declare -a Text[0]="$(echo -n "${Text[0]}" | gsed -rne '1h;1!H;${g;y|\n|\x1C|;s|(%[^%])|\n\1|g;p}' | gsed -rne $(( format + 1 ))'s|^(%[#0 +-]?([1-9][0-9]*)?)\.[1-9][0-9]*([bs])|\1.'${Precision}'\3|;1h;1!H;${g;s|\n||g;y|\x1C|\n|;p}')"
fi
fi
fi
done
fi
fi
printf "${Text[@]}"
exit 0
Usage: fprintf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n" '' '*' '' '' '•' ''
A few things to note:
*
(asterisk) values for formats because I never use them. I wrote this for me and didn't want to over-complicate things.%s
and %b
as they seem to be the only ones that are affected by this problem. Thus, if somehow someone manages to get a multi-byte unicode character out of a number, it may not work without minor modification.printf
(not some old-skooler UNIX hacker), feel free to modify, or use as is all!