printf field width doesn't support multibyte characters?

前端 未结 6 446
孤独总比滥情好
孤独总比滥情好 2021-01-15 04:21

I want printf to recognize multi-byte characters when calculating the field width so that columns line up properly... I can\'t find an answer to this problem and was wonderi

6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-15 04:49

    Are these the only way? There's no way to do it with printf alone?

    Well with the example from ninjalj (thx btw), I wrote a script to deal with this problem, and saved it as fprintf in /usr/local/bin:

    #! /bin/bash
    
    IFS=' '
    declare -a Text=("${@}")
    
    ## Skip the whole thing if there are no multi-byte characters ##
    if (( $(echo "${Text[*]}" | wc -c) > $(echo "${Text[*]}" | wc -m) )); then
        if echo "${Text[*]}" | grep -Eq '%[#0 +-]?[0-9]+(\.[0-9]+)?[sb]'; then
            IFS=$'\n'
            declare -a FormatStrings=($(echo -n "${Text[0]}" | grep -Eo '%[^%]*?[bs]'))
            IFS=$' \t\n'
            declare -i format=0
    
        ## Check every format string ##
            for fw in "${FormatStrings[@]}"; do
                (( format++ ))
                if [[ "$fw" =~ ^%[#0\ +-]?[1-9][0-9]*(\.[1-9][0-9]*)?[sb]$ ]]; then
                    (( Difference = $(echo "${Text[format]}" | wc -c) - $(echo "${Text[format]}" | wc -m) ))
    
                ## If multi-btye characters ##
                    if (( Difference > 0 )); then
    
                    ## If a field width is entered then replace field width value ##
                        if [[ "$fw" =~ ^%[#0\ +-]?[1-9][0-9]* ]]; then
                            (( Width = $(echo -n "$fw" | gsed -re 's|^%[#0 +-]?([1-9][0-9]*).*[bs]|\1|') + Difference ))
                            declare -a Text[0]="$(echo -n "${Text[0]}" | gsed -rne '1h;1!H;${g;y|\n|\x1C|;s|(%[^%])|\n\1|g;p}' | gsed -rne $(( format + 1 ))'s|^(%[#0 +-]?)[1-9][0-9]*|\1'${Width}'|;1h;1!H;${g;s|\n||g;y|\x1C|\n|;p}')"
                        fi
    
                    ## If a precision is entered then replace precision value ##
                        if [[ "$fw" =~ \.[1-9][0-9]*[sb]$ ]]; then
                            (( Precision = $(echo -n "$fw" | gsed -re 's|^%.*\.([1-9][0-9]*)[sb]$|\1|') + Difference ))
                            declare -a Text[0]="$(echo -n "${Text[0]}" | gsed -rne '1h;1!H;${g;y|\n|\x1C|;s|(%[^%])|\n\1|g;p}' | gsed -rne $(( format + 1 ))'s|^(%[#0 +-]?([1-9][0-9]*)?)\.[1-9][0-9]*([bs])|\1.'${Precision}'\3|;1h;1!H;${g;s|\n||g;y|\x1C|\n|;p}')"
                        fi
                    fi
                fi
            done
        fi
    fi
    
    printf "${Text[@]}"
    exit 0
    

    Usage: fprintf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n" '' '*' '' '' '•' ''

    A few things to note:

    • I didn't write this script to deal with * (asterisk) values for formats because I never use them. I wrote this for me and didn't want to over-complicate things.
    • I wrote this to check only the format strings %s and %b as they seem to be the only ones that are affected by this problem. Thus, if somehow someone manages to get a multi-byte unicode character out of a number, it may not work without minor modification.
    • The script works great for basic use of printf (not some old-skooler UNIX hacker), feel free to modify, or use as is all!

提交回复
热议问题