Simplest method to convert file-size with suffix to bytes

前端 未结 6 388
深忆病人
深忆病人 2021-01-03 10:45

Title says it all really, but I\'m currently using a simple function with a case statement to convert human-readable file size strings into a size in bytes. It works well en

相关标签:
6条回答
  • 2021-01-03 11:01

    Okay, so it sounds like there's nothing built-in or widely available, which is a shame, so I've had a go at reducing the size of the function and come up with something that's only really 4 lines long, though it's a pretty complicated four lines!

    I'm not sure if it's suitable as an answer to my original question as it's not really what I'd call the simplest method, but I want to put it up in case anyone thinks it's a useful solution, and it does have the advantage of being really short.

    #!/bin/sh
    to_bytes() {
        units=$(echo "$1" | sed 's/^[0123456789]*//' | tr '[:upper:]' '[:lower:]')
        index=$(echo "$units" | awk '{print index ("bkmgt kbgb  mbtb", $0)}')
        mod=$(echo "1024^(($index-1)%5)" | bc)
        [ "$mod" -gt 0 ] && 
            echo $(echo "$1" | sed 's/[^0123456789].*$//g')"*$mod" | bc
    }
    

    To quickly summarise how it works, it first strips the number from the string given and forces to lowercase. It then use awk to grab the index of the extension from a structured string of valid suffixes. The thing to note is that the string is arranged to multiples of five (so it would need to be widened if more extensions are added), for example k and kb are at indices 2 and 7 respectively. The index is then reduced by one and modulo'd by five so both k and kb become 1, m and mb become 2 and so-on. That's then used to raised 1024 as a power to get the size in bytes. If the extension was invalid this will resolve to a value of zero, and an extension of b (or nothing) will evaluate to 1. So long as mod is greater than zero the input string is reduced to only the numeric part and multiplied by the modifier to get the end result.

    This is actually how I would probably have solved this originally if I were using a language like PHP, Java etc., it's just a bit of a weird one to put together in a shell script.

    I'd still very much appreciate any simplifications though!

    0 讨论(0)
  • 2021-01-03 11:12

    Another variation, adding support for decimal values with a simpler T/G/M/K parser for outputs you might find from simpler Unix programs.

    to_bytes() {
    value=$(echo "$1" | sed -e 's/K//g' | sed -e 's/M//g' | sed -e 's/G//g' | sed -e 's/T//g' )
    units=$(echo -n "$1" | grep -o .$ )
        case "$units" in
            T)   value=$(bc <<< "scale=2; ($value * 1024 * 1024 * 1024 * 1024)")    ;;
            G)   value=$(bc <<< "scale=2; ($value * 1024 * 1024 * 1024)")   ;;
            M)   value=$(bc <<< "scale=2; ($value * 1024 * 1024)")  ;;
            K)   value=$(bc <<< "scale=2; ($value * 1024)") ;;
            b|'')   let 'value += 0'    ;;
            *)
                    value=
                    echo "Unsupported units '$units'" >&2
            ;;
        esac
    echo "$value"
    }
    
    0 讨论(0)
  • 2021-01-03 11:15

    See man numfmt.

    # numfmt --from=iec 42 512K 10M 7G 3.5T
    42
    524288
    10485760
    7516192768
    3848290697216
    
    # numfmt --to=iec 42 524288 10485760 7516192768 3848290697216
    42
    512K
    10M
    7.0G
    3.5T
    
    0 讨论(0)
  • 2021-01-03 11:15

    don't know if this is ok:

    awk 'BEGIN{b=1;k=1024;m=k*k;g=k^3;t=k^4}
    /^[0-9.]+[kgmt]?b?$/&&/[kgmtb]$/{
        sub(/b$/,"")
            sub(/g/,"*"g)
            sub(/k/,"*"k)
            sub(/m/,"*"m)
            sub(/t/,"*"t)
    "echo "$0"|bc"|getline r; print r; exit;}
    {print "invalid input"}'
    
    • this only handles single line input. if multilines are needed, remove the exit
    • this checks only pattern [kgmt] and optional b. e.g. kib, mib would fail. also currently is only for lower-case.

    e.g.:

    kent$  echo "200kb"|awk 'BEGIN{b=1;k=1024;m=k*k;g=k^3;t=k^4}                                                                                                                
    /^[0-9.]+[kgmt]?b?$/&&/[kgmtb]$/{
        sub(/b$/,"")
            sub(/g/,"*"g)
            sub(/k/,"*"k)
            sub(/m/,"*"m)
            sub(/t/,"*"t)
    "echo "$0"|bc"|getline r
    print r; exit
    }{print "invalid input"}'
    204800
    
    0 讨论(0)
  • 2021-01-03 11:16

    Here's something I wrote. It supports k, KB, and KiB. (It doesn't distinguish between powers of two and powers of ten suffixes, though, as in 1KB = 1000 bytes, 1KiB = 1024 bytes.)

    #!/bin/bash
    
    parseSize() {(
        local SUFFIXES=('' K M G T P E Z Y)
        local MULTIPLIER=1
    
        shopt -s nocasematch
    
        for SUFFIX in "${SUFFIXES[@]}"; do
            local REGEX="^([0-9]+)(${SUFFIX}i?B?)?\$"
    
            if [[ $1 =~ $REGEX ]]; then
                echo $((${BASH_REMATCH[1]} * MULTIPLIER))
                return 0
            fi
    
            ((MULTIPLIER *= 1024))
        done
    
        echo "$0: invalid size \`$1'" >&2
        return 1
    )}
    

    Notes:

    • Leverages bash's =~ regex operator, which stores matches in an array named BASH_REMATCH.
    • Notice the cleverly-hidden parentheses surrounding the function body. They're there to keep shopt -s nocasematch from leaking out of the function.
    0 讨论(0)
  • 2021-01-03 11:21
    toBytes() {
     echo $1 | echo $((`sed 's/.*/\L\0/;s/t/Xg/;s/g/Xm/;s/m/Xk/;s/k/X/;s/b//;s/X/ *1024/g'`))
    }
    
    0 讨论(0)
提交回复
热议问题