Using strings and byte-like objects compatibly in code to run in both Python 2 & 3

问题

I'm trying to modify the code shown far below, which works in Python 2.7.x, so it will also work unchanged in Python 3.x. However I'm encountering the following problem I can't solve in the first function, bin_to_float() as shown by the output below:

float_to_bin(0.000000): '0'
Traceback (most recent call last):
  File "binary-to-a-float-number.py", line 36, in <module>
    float = bin_to_float(binary)
  File "binary-to-a-float-number.py", line 9, in bin_to_float
    return struct.unpack('>d', bf)[0]
TypeError: a bytes-like object is required, not 'str'

I tried to fix that by adding a bf = bytes(bf) right before the call to struct.unpack(), but doing so produced its own TypeError:

TypeError: string argument without an encoding

So my questions are is it possible to fix this issue and achieve my goal? And if so, how? Preferably in a way that would work in both versions of Python.

Here's the code that works in Python 2:

import struct

def bin_to_float(b):
    """ Convert binary string to a float. """
    bf = int_to_bytes(int(b, 2), 8)  # 8 bytes needed for IEEE 754 binary64
    return struct.unpack('>d', bf)[0]

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    bytes = []
    for _ in range(nbytes):
        bytes.append(chr(n & 0xff))
        n >>= 8
    if minlen > 0 and len(bytes) < minlen:  # zero pad?
        bytes.extend((minlen-len(bytes)) * '0')
    return ''.join(reversed(bytes))  # high bytes at beginning

# tests

def float_to_bin(f):
    """ Convert a float into a binary string. """
    ba = struct.pack('>d', f)
    ba = bytearray(ba)
    s = ''.join('{:08b}'.format(b) for b in ba)
    s = s.lstrip('0')  # strip leading zeros
    return s if s else '0'  # but leave at least one

for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
    binary = float_to_bin(f)
    print('float_to_bin(%f): %r' % (f, binary))
    float = bin_to_float(binary)
    print('bin_to_float(%r): %f' % (binary, float))
    print('')

回答1:

I had a different approach from @metatoaster's answer. I just modified int_to_bytes to use and return a bytearray:

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    b = bytearray()
    for _ in range(nbytes):
        b.append(n & 0xff)
        n >>= 8
    if minlen > 0 and len(b) < minlen:  # zero pad?
        b.extend([0] * (minlen-len(b)))
    return bytearray(reversed(b))  # high bytes at beginning

This seems to work without any other modifications under both Python 2.7.11 and Python 3.5.1.

Note that I zero padded with 0 instead of '0'. I didn't do much testing, but surely that's what you meant?

回答2:

To make portable code that works with bytes in both Python 2 and 3 using libraries that literally use the different data types between the two, you need to explicitly declare them using the appropriate literal mark for every string (or add from __future__ import unicode_literals to top of every module doing this). This step is to ensure your data types are correct internally in your code.

Secondly, make the decision to support Python 3 going forward, with fallbacks specific for Python 2. This means overriding str with unicode, and figure out methods/functions that do not return the same types in both Python versions should be modified and replaced to return the correct type (being the Python 3 version). Do note that bytes is a reserved word, too, so don't use that.

Putting this together, your code will look something like this:

import struct
import sys

if sys.version_info < (3, 0):
    str = unicode
    chr = unichr


def bin_to_float(b):
    """ Convert binary string to a float. """
    bf = int_to_bytes(int(b, 2), 8)  # 8 bytes needed for IEEE 754 binary64
    return struct.unpack(b'>d', bf)[0]

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    ba = bytearray(b'')
    for _ in range(nbytes):
        ba.append(n & 0xff)
        n >>= 8
    if minlen > 0 and len(ba) < minlen:  # zero pad?
        ba.extend((minlen-len(ba)) * b'0')
    return u''.join(str(chr(b)) for b in reversed(ba)).encode('latin1')  # high bytes at beginning

# tests

def float_to_bin(f):
    """ Convert a float into a binary string. """
    ba = struct.pack(b'>d', f)
    ba = bytearray(ba)
    s = u''.join(u'{:08b}'.format(b) for b in ba)
    s = s.lstrip(u'0')  # strip leading zeros
    return (s if s else u'0').encode('latin1')  # but leave at least one

for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
    binary = float_to_bin(f)
    print(u'float_to_bin(%f): %r' % (f, binary))
    float = bin_to_float(binary)
    print(u'bin_to_float(%r): %f' % (binary, float))
    print(u'')

I used the latin1 codec simply because that's what the byte mappings are originally defined, and it seems to work

$ python2 foo.py 
float_to_bin(0.000000): '0'
bin_to_float('0'): 0.000000

float_to_bin(1.000000): '11111111110000000000000000000000000000000000000000000000000000'
bin_to_float('11111111110000000000000000000000000000000000000000000000000000'): 1.000000

float_to_bin(-14.000000): '1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float('1100000000101100000000000000000000000000000000000000000000000000'): -14.000000

float_to_bin(12.546000): '100000000101001000101111000110101001111110111110011101101100100'
bin_to_float('100000000101001000101111000110101001111110111110011101101100100'): 12.546000

float_to_bin(3.141593): '100000000001001001000011111101110000010110000101011110101111111'
bin_to_float('100000000001001001000011111101110000010110000101011110101111111'): 3.141593

Again, but this time under Python 3.5)

$ python3 foo.py 
float_to_bin(0.000000): b'0'
bin_to_float(b'0'): 0.000000

float_to_bin(1.000000): b'11111111110000000000000000000000000000000000000000000000000000'
bin_to_float(b'11111111110000000000000000000000000000000000000000000000000000'): 1.000000

float_to_bin(-14.000000): b'1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float(b'1100000000101100000000000000000000000000000000000000000000000000'): -14.000000

float_to_bin(12.546000): b'100000000101001000101111000110101001111110111110011101101100100'
bin_to_float(b'100000000101001000101111000110101001111110111110011101101100100'): 12.546000

float_to_bin(3.141593): b'100000000001001001000011111101110000010110000101011110101111111'
bin_to_float(b'100000000001001001000011111101110000010110000101011110101111111'): 3.141593

It's a lot more work, but in Python3 you can more clearly see that the types are done as proper bytes. I also changed your bytes = [] to a bytearray to more clearly express what you were trying to do.

回答3:

In Python 3, integers have a to_bytes() method that can perform the conversion in a single call. However, since you asked for a solution that works on Python 2 and 3 unmodified, here's an alternative approach.

If you take a detour via hexadecimal representation, the function int_to_bytes() becomes very simple:

import codecs

def int_to_bytes(n, minlen=0):
    hex_str = format(n, "0{}x".format(2 * minlen))
    return codecs.decode(hex_str, "hex")

You might need some special case handling to deal with the case when the hex string gets an odd number of characters.

Note that I'm not sure this works with all versions of Python 3. I remember that pseudo-encodings weren't supported in some 3.x version, but I don't remember the details. I tested the code with Python 3.5.

来源：https://stackoverflow.com/questions/39252140/using-strings-and-byte-like-objects-compatibly-in-code-to-run-in-both-python-2

标签

python

string

python-2.7

python-3.x

bytestring