bytes to human readable, and back. without data loss

后端 未结 2 809
旧巷少年郎
旧巷少年郎 2021-01-02 08:17

I need to convert strings which contain the memory usage in bytes, like: 1048576 (which is 1M) into exactly that, a human-readable version, and visa-versa.

相关标签:
2条回答
  • 2021-01-02 08:56

    So it turns out the answer was much simpler than I thought - one of the links that I provided actually led to a much more detailed version of the function:

    Which is able to deal with any scope I give it.

    But thank you for your help:

    The code copied here for posterity:

    ## {{{ http://code.activestate.com/recipes/578019/ (r15)
    #!/usr/bin/env python
    
    """
    Bytes-to-human / human-to-bytes converter.
    Based on: http://goo.gl/kTQMs
    Working with Python 2.x and 3.x.
    
    Author: Giampaolo Rodola' <g.rodola [AT] gmail [DOT] com>
    License: MIT
    """
    
    # see: http://goo.gl/kTQMs
    SYMBOLS = {
        'customary'     : ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'),
        'customary_ext' : ('byte', 'kilo', 'mega', 'giga', 'tera', 'peta', 'exa',
                           'zetta', 'iotta'),
        'iec'           : ('Bi', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi', 'Yi'),
        'iec_ext'       : ('byte', 'kibi', 'mebi', 'gibi', 'tebi', 'pebi', 'exbi',
                           'zebi', 'yobi'),
    }
    
    def bytes2human(n, format='%(value).1f %(symbol)s', symbols='customary'):
        """
        Convert n bytes into a human readable string based on format.
        symbols can be either "customary", "customary_ext", "iec" or "iec_ext",
        see: http://goo.gl/kTQMs
    
          >>> bytes2human(0)
          '0.0 B'
          >>> bytes2human(0.9)
          '0.0 B'
          >>> bytes2human(1)
          '1.0 B'
          >>> bytes2human(1.9)
          '1.0 B'
          >>> bytes2human(1024)
          '1.0 K'
          >>> bytes2human(1048576)
          '1.0 M'
          >>> bytes2human(1099511627776127398123789121)
          '909.5 Y'
    
          >>> bytes2human(9856, symbols="customary")
          '9.6 K'
          >>> bytes2human(9856, symbols="customary_ext")
          '9.6 kilo'
          >>> bytes2human(9856, symbols="iec")
          '9.6 Ki'
          >>> bytes2human(9856, symbols="iec_ext")
          '9.6 kibi'
    
          >>> bytes2human(10000, "%(value).1f %(symbol)s/sec")
          '9.8 K/sec'
    
          >>> # precision can be adjusted by playing with %f operator
          >>> bytes2human(10000, format="%(value).5f %(symbol)s")
          '9.76562 K'
        """
        n = int(n)
        if n < 0:
            raise ValueError("n < 0")
        symbols = SYMBOLS[symbols]
        prefix = {}
        for i, s in enumerate(symbols[1:]):
            prefix[s] = 1 << (i+1)*10
        for symbol in reversed(symbols[1:]):
            if n >= prefix[symbol]:
                value = float(n) / prefix[symbol]
                return format % locals()
        return format % dict(symbol=symbols[0], value=n)
    
    def human2bytes(s):
        """
        Attempts to guess the string format based on default symbols
        set and return the corresponding bytes as an integer.
        When unable to recognize the format ValueError is raised.
    
          >>> human2bytes('0 B')
          0
          >>> human2bytes('1 K')
          1024
          >>> human2bytes('1 M')
          1048576
          >>> human2bytes('1 Gi')
          1073741824
          >>> human2bytes('1 tera')
          1099511627776
    
          >>> human2bytes('0.5kilo')
          512
          >>> human2bytes('0.1  byte')
          0
          >>> human2bytes('1 k')  # k is an alias for K
          1024
          >>> human2bytes('12 foo')
          Traceback (most recent call last):
              ...
          ValueError: can't interpret '12 foo'
        """
        init = s
        num = ""
        while s and s[0:1].isdigit() or s[0:1] == '.':
            num += s[0]
            s = s[1:]
        num = float(num)
        letter = s.strip()
        for name, sset in SYMBOLS.items():
            if letter in sset:
                break
        else:
            if letter == 'k':
                # treat 'k' as an alias for 'K' as per: http://goo.gl/kTQMs
                sset = SYMBOLS['customary']
                letter = letter.upper()
            else:
                raise ValueError("can't interpret %r" % init)
        prefix = {sset[0]:1}
        for i, s in enumerate(sset[1:]):
            prefix[s] = 1 << (i+1)*10
        return int(num * prefix[letter])
    
    
    if __name__ == "__main__":
        import doctest
        doctest.testmod()
    ## end of http://code.activestate.com/recipes/578019/ }}}
    
    0 讨论(0)
  • 2021-01-02 09:05

    You are pretty much answering your own question in your last note, there.

    In human2bytes(s), the input string -- 9.766K for example -- is split up in two parts, the number and the prefix. After the assertion (which as you correctly observe is what throws the error), the number is multiplied by the corresponding value that the prefix represents, so 9.766 * 1000 = 9766. The only way to "avoid" data loss is to accept a sufficiently precise floating-point value as input.

    In order to make human2bytes accept floating-point input, you could either remove num.isdigit() from the assertion and then wrap the typecasting num = float(num) with try-except, or check it by some other means.

    0 讨论(0)
提交回复
热议问题