printf field width doesn't support multibyte characters?

前端 未结 6 445
孤独总比滥情好
孤独总比滥情好 2021-01-15 04:21

I want printf to recognize multi-byte characters when calculating the field width so that columns line up properly... I can\'t find an answer to this problem and was wonderi

相关标签:
6条回答
  • 2021-01-15 04:47

    The best I can think of is:

    function formatwidth
    {
      local STR=$1; shift
      local WIDTH=$1; shift
      local BYTEWIDTH=$( echo -n "$STR" | wc -c )
      local CHARWIDTH=$( echo -n "$STR" | wc -m )
      echo $(( $WIDTH + $BYTEWIDTH - $CHARWIDTH ))
    }
    
    printf "## %5s %*s %5s ##\n## %5s %*s %5s ##\n" \
        '' $( formatwidth "*" 5 ) '*' '' \
        '' $( formatwidth "•" 5 ) "•" ''
    

    You use the * width specifier to take the width as an argument, and calculate the width you need by adding the number of additional bytes in multibyte characters.

    Note that in GNU wc, -c returns bytes, and -m returns (possibly multibyte) characters.

    0 讨论(0)
  • 2021-01-15 04:47

    I will probably use GNU awk:

    awk 'BEGIN{ printf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n", "", "*", "", "", "•", "" }'
    ##           *       ##
    ##           •       ##
    

    You can even write shell wrapper function called printf on top of awk to keep same interface:

    tr2awk() { 
        FMT="$1"
        echo -n "gawk 'BEGIN{ printf \"$FMT\""
        shift
        for ARG in "$@"
            do echo -n ", \"$ARG\""
        done
        echo " }'"
    }
    

    and then override printf with simple function:

    printf() { eval `tr2awk "$@"`; }
    

    Test it:

    # buggy printf binary test:
    /usr/bin/printf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n" '' '*' '' '' "•" ''
    ##           *       ##
    ##         •       ##
    # buggy printf shell builin test:
    builtin printf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n" '' '*' '' '' "•" ''
    ##           *       ##
    ##         •       ##
    
    # fixed printf function test:
    printf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n" '' '*' '' '' "•" ''
    ##           *       ##
    ##           •       ##
    
    0 讨论(0)
  • 2021-01-15 04:49

    Are these the only way? There's no way to do it with printf alone?

    Well with the example from ninjalj (thx btw), I wrote a script to deal with this problem, and saved it as fprintf in /usr/local/bin:

    #! /bin/bash
    
    IFS=' '
    declare -a Text=("${@}")
    
    ## Skip the whole thing if there are no multi-byte characters ##
    if (( $(echo "${Text[*]}" | wc -c) > $(echo "${Text[*]}" | wc -m) )); then
        if echo "${Text[*]}" | grep -Eq '%[#0 +-]?[0-9]+(\.[0-9]+)?[sb]'; then
            IFS=$'\n'
            declare -a FormatStrings=($(echo -n "${Text[0]}" | grep -Eo '%[^%]*?[bs]'))
            IFS=$' \t\n'
            declare -i format=0
    
        ## Check every format string ##
            for fw in "${FormatStrings[@]}"; do
                (( format++ ))
                if [[ "$fw" =~ ^%[#0\ +-]?[1-9][0-9]*(\.[1-9][0-9]*)?[sb]$ ]]; then
                    (( Difference = $(echo "${Text[format]}" | wc -c) - $(echo "${Text[format]}" | wc -m) ))
    
                ## If multi-btye characters ##
                    if (( Difference > 0 )); then
    
                    ## If a field width is entered then replace field width value ##
                        if [[ "$fw" =~ ^%[#0\ +-]?[1-9][0-9]* ]]; then
                            (( Width = $(echo -n "$fw" | gsed -re 's|^%[#0 +-]?([1-9][0-9]*).*[bs]|\1|') + Difference ))
                            declare -a Text[0]="$(echo -n "${Text[0]}" | gsed -rne '1h;1!H;${g;y|\n|\x1C|;s|(%[^%])|\n\1|g;p}' | gsed -rne $(( format + 1 ))'s|^(%[#0 +-]?)[1-9][0-9]*|\1'${Width}'|;1h;1!H;${g;s|\n||g;y|\x1C|\n|;p}')"
                        fi
    
                    ## If a precision is entered then replace precision value ##
                        if [[ "$fw" =~ \.[1-9][0-9]*[sb]$ ]]; then
                            (( Precision = $(echo -n "$fw" | gsed -re 's|^%.*\.([1-9][0-9]*)[sb]$|\1|') + Difference ))
                            declare -a Text[0]="$(echo -n "${Text[0]}" | gsed -rne '1h;1!H;${g;y|\n|\x1C|;s|(%[^%])|\n\1|g;p}' | gsed -rne $(( format + 1 ))'s|^(%[#0 +-]?([1-9][0-9]*)?)\.[1-9][0-9]*([bs])|\1.'${Precision}'\3|;1h;1!H;${g;s|\n||g;y|\x1C|\n|;p}')"
                        fi
                    fi
                fi
            done
        fi
    fi
    
    printf "${Text[@]}"
    exit 0
    

    Usage: fprintf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n" '' '*' '' '' '•' ''

    A few things to note:

    • I didn't write this script to deal with * (asterisk) values for formats because I never use them. I wrote this for me and didn't want to over-complicate things.
    • I wrote this to check only the format strings %s and %b as they seem to be the only ones that are affected by this problem. Thus, if somehow someone manages to get a multi-byte unicode character out of a number, it may not work without minor modification.
    • The script works great for basic use of printf (not some old-skooler UNIX hacker), feel free to modify, or use as is all!
    0 讨论(0)
  • 2021-01-15 04:53

    A language like python will probably solve your problems in a simpler, more controllable way...

    #!/usr/bin/python
    # coding=utf-8
    
    import sys
    import codecs
    import unicodedata
    
    out = codecs.getwriter('utf-8')(sys.stdout)
    
    def width(string):
        return sum(1+(unicodedata.east_asian_width(c) in "WF")
            for c in string)
    
    a1=[u'する', u'します', u'trazan', u'した', u'しました']
    a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']
    
    for i,j in zip(a1,a2):
        out.write('%s %s: %s\n' % (i, ' '*(12-width(i)), j))
    
    0 讨论(0)
  • 2021-01-15 05:00

    A pure shell solution

    right_justify() {
            # parameters: field_width string
            local spaces questions
            spaces=''
            questions=''
            while [ "${#questions}" -lt "$1" ]; do
                    spaces=$spaces" "
                    questions=$questions?
            done
            result=$spaces$2
            result=${result#"${result%$questions}"}
    }
    

    Note that this still does not work in dash because dash has no locale support.

    0 讨论(0)
  • 2021-01-15 05:00

    This is kind of late, but I just came across this, and thought I would post it for others coming across the same post. A variation to @ninjalj's answer might be to create a function that returns a string of a given length rather than calculate the required format length:

    #!/bin/bash
    function sized_string
    {
            STR=$1; WIDTH=$2
            local BYTEWIDTH=$( echo -n "$STR" | wc -c )
            local CHARWIDTH=$( echo -n "$STR" | wc -m )
            FMT_WIDTH=$(( $WIDTH + $BYTEWIDTH - $CHARWIDTH ))
            printf "%*s" $FMT_WIDTH $STR
    }
    printf "[%s]\n" "$(sized_string "abc" 20)"
    printf "[%s]\n" "$(sized_string "ab•cd" 20)"
    

    which outputs:

    [                 abc]
    [               ab•cd]
    
    0 讨论(0)
提交回复
热议问题