Title says it all really, but I\'m currently using a simple function with a case statement to convert human-readable file size strings into a size in bytes. It works well en
Okay, so it sounds like there's nothing built-in or widely available, which is a shame, so I've had a go at reducing the size of the function and come up with something that's only really 4 lines long, though it's a pretty complicated four lines!
I'm not sure if it's suitable as an answer to my original question as it's not really what I'd call the simplest method, but I want to put it up in case anyone thinks it's a useful solution, and it does have the advantage of being really short.
#!/bin/sh
to_bytes() {
units=$(echo "$1" | sed 's/^[0123456789]*//' | tr '[:upper:]' '[:lower:]')
index=$(echo "$units" | awk '{print index ("bkmgt kbgb mbtb", $0)}')
mod=$(echo "1024^(($index-1)%5)" | bc)
[ "$mod" -gt 0 ] &&
echo $(echo "$1" | sed 's/[^0123456789].*$//g')"*$mod" | bc
}
To quickly summarise how it works, it first strips the number from the string given and forces to lowercase. It then use awk
to grab the index of the extension from a structured string of valid suffixes. The thing to note is that the string is arranged to multiples of five (so it would need to be widened if more extensions are added), for example k and kb are at indices 2 and 7 respectively.
The index is then reduced by one and modulo'd by five so both k and kb become 1, m and mb become 2 and so-on. That's then used to raised 1024 as a power to get the size in bytes. If the extension was invalid this will resolve to a value of zero, and an extension of b (or nothing) will evaluate to 1.
So long as mod is greater than zero the input string is reduced to only the numeric part and multiplied by the modifier to get the end result.
This is actually how I would probably have solved this originally if I were using a language like PHP, Java etc., it's just a bit of a weird one to put together in a shell script.
I'd still very much appreciate any simplifications though!
Another variation, adding support for decimal values with a simpler T/G/M/K parser for outputs you might find from simpler Unix programs.
to_bytes() {
value=$(echo "$1" | sed -e 's/K//g' | sed -e 's/M//g' | sed -e 's/G//g' | sed -e 's/T//g' )
units=$(echo -n "$1" | grep -o .$ )
case "$units" in
T) value=$(bc <<< "scale=2; ($value * 1024 * 1024 * 1024 * 1024)") ;;
G) value=$(bc <<< "scale=2; ($value * 1024 * 1024 * 1024)") ;;
M) value=$(bc <<< "scale=2; ($value * 1024 * 1024)") ;;
K) value=$(bc <<< "scale=2; ($value * 1024)") ;;
b|'') let 'value += 0' ;;
*)
value=
echo "Unsupported units '$units'" >&2
;;
esac
echo "$value"
}
See man numfmt
.
# numfmt --from=iec 42 512K 10M 7G 3.5T
42
524288
10485760
7516192768
3848290697216
# numfmt --to=iec 42 524288 10485760 7516192768 3848290697216
42
512K
10M
7.0G
3.5T
don't know if this is ok:
awk 'BEGIN{b=1;k=1024;m=k*k;g=k^3;t=k^4}
/^[0-9.]+[kgmt]?b?$/&&/[kgmtb]$/{
sub(/b$/,"")
sub(/g/,"*"g)
sub(/k/,"*"k)
sub(/m/,"*"m)
sub(/t/,"*"t)
"echo "$0"|bc"|getline r; print r; exit;}
{print "invalid input"}'
exit
[kgmt]
and optional b
. e.g. kib, mib
would fail. also currently is only for lower-case. e.g.:
kent$ echo "200kb"|awk 'BEGIN{b=1;k=1024;m=k*k;g=k^3;t=k^4}
/^[0-9.]+[kgmt]?b?$/&&/[kgmtb]$/{
sub(/b$/,"")
sub(/g/,"*"g)
sub(/k/,"*"k)
sub(/m/,"*"m)
sub(/t/,"*"t)
"echo "$0"|bc"|getline r
print r; exit
}{print "invalid input"}'
204800
Here's something I wrote. It supports k
, KB
, and KiB
. (It doesn't distinguish between powers of two and powers of ten suffixes, though, as in 1KB = 1000 bytes, 1KiB = 1024 bytes.)
#!/bin/bash
parseSize() {(
local SUFFIXES=('' K M G T P E Z Y)
local MULTIPLIER=1
shopt -s nocasematch
for SUFFIX in "${SUFFIXES[@]}"; do
local REGEX="^([0-9]+)(${SUFFIX}i?B?)?\$"
if [[ $1 =~ $REGEX ]]; then
echo $((${BASH_REMATCH[1]} * MULTIPLIER))
return 0
fi
((MULTIPLIER *= 1024))
done
echo "$0: invalid size \`$1'" >&2
return 1
)}
Notes:
=~
regex operator, which stores matches in an array named BASH_REMATCH
.shopt -s nocasematch
from leaking out of the function.toBytes() {
echo $1 | echo $((`sed 's/.*/\L\0/;s/t/Xg/;s/g/Xm/;s/m/Xk/;s/k/X/;s/b//;s/X/ *1024/g'`))
}