Python: Converting a Text File to a Binary File

后端 未结 2 801
萌比男神i
萌比男神i 2021-01-23 15:22

We can convert any digital file into binary file.

I have a text file of 1MB,

I want to convert it to a binary string and see the output as a binary number and th

相关标签:
2条回答
  • 2021-01-23 15:36

    The "text file" mentioned seems to refer to ASCII file. (where each character takes up 8 bits of space).

    2nd line "convert it to a binary string" could mean ASCII representation of the text file, giving a sequences of bytes to be "output as a binary number" (similar to public key cryptography where text is converted to a number before encryption) eg.

    text = 'ABC '
    for x in text:
      print(format(ord(x), '08b'), end='')
    

    would give binary (number) string: 01000001010000100100001100100000
    which in decimal is: 1094861600

    The 3rd line would mean to (byte) sequence a binary number & display the equivalent ASCII characters (for each 8-bit sequence) eg. 0x41 to be replaced with 'A' (as output) (The assumption here would be that each number would map to a printable ASCII ie. text character, and the given binary number has a multiple of 8 digits).

    eg. To reverse (convert binary number to text):

    binary = "01000001010000100100001100100001"
    #number of characters in text
    num = len(binary)/8 
    
    for x in range(int(num)):
      start = x*8
      end = (x+1)*8
      print (chr(int(str(binary[start:end]),2)), end='')
    print()
    

    would give the text: ABC!

    For a 1MB text file, you'd split the text string into chunks your machine can handle eg. 32 bits (before converting)

    Tested in Python IDE

    0 讨论(0)
  • 2021-01-23 15:50

    See https://docs.python.org/3/library/codecs.html#standard-encodings for a list of standard string encodings, because the conversion depends on the encoding.

    These functions will help to convert between bytes/ints and strings, defaulting to UTF-8.

    The example provided uses the Hangul character "한" in UTF-8.

    
    def bytes_to_string(byte_or_int_value, encoding='utf-8') -> str:
        if isinstance(byte_or_int_value, bytes):
            return byte_or_int_value.decode(encoding)
        if isinstance(byte_or_int_value, int):
            return chr(byte_or_int_value).encode(encoding).decode(encoding)
        else: 
            raise ValueError('Error: Input must be a bytes or int type')
    
    def string_to_bytes(string_value, encoding='utf-8') -> bytes:
        if isinstance(string_value, str):
            return bytes(string_value.encode(encoding))
        else: 
            raise ValueError('Error: Input must be a string type')
    
    int_value = 54620
    bytes_value = b'\xED\x95\x9C'
    string_value = '한'
    
    assert bytes_to_string(int_value) == string_value
    assert bytes_to_string(bytes_value) == string_value
    assert string_to_bytes(string_value) == bytes_value
    
    0 讨论(0)
提交回复
热议问题