Simplest method to convert file-size with suffix to bytes

前端未结

关注

 6  389

Title says it all really, but I\'m currently using a simple function with a case statement to convert human-readable file size strings into a size in bytes. It works well en

相关标签:

6条回答

无人共我

2021-01-03 11:01
Okay, so it sounds like there's nothing built-in or widely available, which is a shame, so I've had a go at reducing the size of the function and come up with something that's only really 4 lines long, though it's a pretty complicated four lines!

I'm not sure if it's suitable as an answer to my original question as it's not really what I'd call the simplest method, but I want to put it up in case anyone thinks it's a useful solution, and it does have the advantage of being really short.
```
#!/bin/sh
to_bytes() {
    units=$(echo "$1" | sed 's/^[0123456789]*//' | tr '[:upper:]' '[:lower:]')
    index=$(echo "$units" | awk '{print index ("bkmgt kbgb  mbtb", $0)}')
    mod=$(echo "1024^(($index-1)%5)" | bc)
    [ "$mod" -gt 0 ] && 
        echo $(echo "$1" | sed 's/[^0123456789].*$//g')"*$mod" | bc
}
```
To quickly summarise how it works, it first strips the number from the string given and forces to lowercase. It then use awk to grab the index of the extension from a structured string of valid suffixes. The thing to note is that the string is arranged to multiples of five (so it would need to be widened if more extensions are added), for example k and kb are at indices 2 and 7 respectively. The index is then reduced by one and modulo'd by five so both k and kb become 1, m and mb become 2 and so-on. That's then used to raised 1024 as a power to get the size in bytes. If the extension was invalid this will resolve to a value of zero, and an extension of b (or nothing) will evaluate to 1. So long as mod is greater than zero the input string is reduced to only the numeric part and multiplied by the modifier to get the end result.

This is actually how I would probably have solved this originally if I were using a language like PHP, Java etc., it's just a bit of a weird one to put together in a shell script.

I'd still very much appreciate any simplifications though!
0 讨论(0)
发布评论:

提交评论
- 加载中...

北恋

2021-01-03 11:12

Another variation, adding support for decimal values with a simpler T/G/M/K parser for outputs you might find from simpler Unix programs.

to_bytes() {
value=$(echo "$1" | sed -e 's/K//g' | sed -e 's/M//g' | sed -e 's/G//g' | sed -e 's/T//g' )
units=$(echo -n "$1" | grep -o .$ )
    case "$units" in
        T)   value=$(bc <<< "scale=2; ($value * 1024 * 1024 * 1024 * 1024)")    ;;
        G)   value=$(bc <<< "scale=2; ($value * 1024 * 1024 * 1024)")   ;;
        M)   value=$(bc <<< "scale=2; ($value * 1024 * 1024)")  ;;
        K)   value=$(bc <<< "scale=2; ($value * 1024)") ;;
        b|'')   let 'value += 0'    ;;
        *)
                value=
                echo "Unsupported units '$units'" >&2
        ;;
    esac
echo "$value"
}

0 讨论(0)

南方客

2021-01-03 11:15

See man numfmt.

# numfmt --from=iec 42 512K 10M 7G 3.5T
42
524288
10485760
7516192768
3848290697216

# numfmt --to=iec 42 524288 10485760 7516192768 3848290697216
42
512K
10M
7.0G
3.5T

0 讨论(0)

北恋

2021-01-03 11:15

don't know if this is ok:

awk 'BEGIN{b=1;k=1024;m=k*k;g=k^3;t=k^4}
/^[0-9.]+[kgmt]?b?$/&&/[kgmtb]$/{
    sub(/b$/,"")
        sub(/g/,"*"g)
        sub(/k/,"*"k)
        sub(/m/,"*"m)
        sub(/t/,"*"t)
"echo "$0"|bc"|getline r; print r; exit;}
{print "invalid input"}'

this only handles single line input. if multilines are needed, remove the exit
this checks only pattern [kgmt] and optional b. e.g. kib, mib would fail. also currently is only for lower-case.

e.g.:

kent$  echo "200kb"|awk 'BEGIN{b=1;k=1024;m=k*k;g=k^3;t=k^4}                                                                                                                
/^[0-9.]+[kgmt]?b?$/&&/[kgmtb]$/{
    sub(/b$/,"")
        sub(/g/,"*"g)
        sub(/k/,"*"k)
        sub(/m/,"*"m)
        sub(/t/,"*"t)
"echo "$0"|bc"|getline r
print r; exit
}{print "invalid input"}'
204800

0 讨论(0)

灰色年华

2021-01-03 11:16
Here's something I wrote. It supports k, KB, and KiB. (It doesn't distinguish between powers of two and powers of ten suffixes, though, as in 1KB = 1000 bytes, 1KiB = 1024 bytes.)
```
#!/bin/bash

parseSize() {(
    local SUFFIXES=('' K M G T P E Z Y)
    local MULTIPLIER=1

    shopt -s nocasematch

    for SUFFIX in "${SUFFIXES[@]}"; do
        local REGEX="^([0-9]+)(${SUFFIX}i?B?)?\$"

        if [[ $1 =~ $REGEX ]]; then
            echo $((${BASH_REMATCH[1]} * MULTIPLIER))
            return 0
        fi

        ((MULTIPLIER *= 1024))
    done

    echo "$0: invalid size \`$1'" >&2
    return 1
)}
```
Notes:
- Leverages bash's =~ regex operator, which stores matches in an array named BASH_REMATCH.
- Notice the cleverly-hidden parentheses surrounding the function body. They're there to keep shopt -s nocasematch from leaking out of the function.
0 讨论(0)
发布评论:

提交评论
- 加载中...

一向

2021-01-03 11:21

toBytes() {
 echo $1 | echo $((`sed 's/.*/\L\0/;s/t/Xg/;s/g/Xm/;s/m/Xk/;s/k/X/;s/b//;s/X/ *1024/g'`))
}

0 讨论(0)