We can convert any digital file into binary file.
I have a text file of 1MB,
I want to convert it to a binary string and see the output as a binary number and th
The "text file" mentioned seems to refer to ASCII file. (where each character takes up 8 bits of space).
2nd line "convert it to a binary string" could mean ASCII representation of the text file, giving a sequences of bytes to be "output as a binary number" (similar to public key cryptography where text is converted to a number before encryption) eg.
text = 'ABC '
for x in text:
print(format(ord(x), '08b'), end='')
would give binary (number) string: 01000001010000100100001100100000
which in decimal is: 1094861600
The 3rd line would mean to (byte) sequence a binary number & display the equivalent ASCII characters (for each 8-bit sequence) eg. 0x41 to be replaced with 'A' (as output) (The assumption here would be that each number would map to a printable ASCII ie. text character, and the given binary number has a multiple of 8 digits).
eg. To reverse (convert binary number to text):
binary = "01000001010000100100001100100001"
#number of characters in text
num = len(binary)/8
for x in range(int(num)):
start = x*8
end = (x+1)*8
print (chr(int(str(binary[start:end]),2)), end='')
print()
would give the text: ABC!
For a 1MB text file, you'd split the text string into chunks your machine can handle eg. 32 bits (before converting)
Tested in Python IDE
See https://docs.python.org/3/library/codecs.html#standard-encodings for a list of standard string encodings, because the conversion depends on the encoding.
These functions will help to convert between bytes/ints and strings, defaulting to UTF-8.
The example provided uses the Hangul character "한" in UTF-8.
def bytes_to_string(byte_or_int_value, encoding='utf-8') -> str:
if isinstance(byte_or_int_value, bytes):
return byte_or_int_value.decode(encoding)
if isinstance(byte_or_int_value, int):
return chr(byte_or_int_value).encode(encoding).decode(encoding)
else:
raise ValueError('Error: Input must be a bytes or int type')
def string_to_bytes(string_value, encoding='utf-8') -> bytes:
if isinstance(string_value, str):
return bytes(string_value.encode(encoding))
else:
raise ValueError('Error: Input must be a string type')
int_value = 54620
bytes_value = b'\xED\x95\x9C'
string_value = '한'
assert bytes_to_string(int_value) == string_value
assert bytes_to_string(bytes_value) == string_value
assert string_to_bytes(string_value) == bytes_value