Python - trying to deal with the bits of a file

问题

I have very recently started to learn Python, and I chose to learn things by trying to solve a problem that I find interesting. This problem is to take a file (binary or not) and encrypt it using a simple method, something like replacing every "1001 0001" in it with a "0010 0101", and vice-versa.

However, I didn't find a way to do it. When reading the file, I can create an array in which each element contains one byte of data, with the read() method. But how can I replace this byte with another one, if it is one of the bytes I chose to replace, and then write the resulting information into the output encrypted file?

Thanks in advance!

回答1:

To swap bytes 10010001 and 00100101:

#!/usr/bin/env python
import string

a, b = map(chr, [0b10010001, 0b00100101])
translation_table = string.maketrans(a+b, b+a) # swap a,b

with open('input', 'rb') as fin, open('output', 'wb') as fout:
     fout.write(fin.read().translate(translation_table))

回答2:

read() returns an immutable string, so you'll first need to convert that to a list of characters. Then go through your list and change the bytes as needed, and finally join the list back into a new string to write to the output file.

filedata = f.read()
filebytes = list(filedata)
for i, c in enumerate(filebytes):
    if ord(c) == 0x91:
        filebytes[i] = chr(0x25)
newfiledata = ''.join(filebytes)

回答3:

Following Aaron's answer, once you have a string, then you can also use translate or replace:

In [43]: s = 'abc'

In [44]: s.replace('ab', 'ba')
Out[44]: 'bac'

In [45]: tbl = string.maketrans('a', 'd')

In [46]: s.translate(tbl)
Out[46]: 'dbc'

Docs: Python string.

回答4:

I'm sorry about this somewhat relevant wall of text -- I'm just in a teaching mood.

If you want to optimize such an operation, I suggest using numpy. The advantage is that the entire translation operation is done with a single numpy operation, and those are written in C, so it is about as fast as you can get it using python.

In the below example I simply XOR every byte with 0b11111111 using a lookup table -- first element is the translation of 0b0000000, the second the translation of 0b00000001, third 0b00000010, and so on. By altering the lookup table, you can do any kind of translation that does not change within the file.

import numpy as np
import sys

data = np.fromfile(sys.argv[1], dtype="uint8")
lookup_table = np.array(
    [i ^ 0xFF for i in range(256)], dtype="uint8")
lookup_table[data].tofile(sys.argv[2])

To highlight the simplicity of it all I've done no argument checking. Invoke script like this:

python name_of_script.py input_file.txt output_file.txt

To directly answer your question, if you want to swap 0b10010001 and 0b00100101, you replace the lookup_table = ... line with this:

lookup_table = np.array(range(256), dtype="uint8")
lookup_table[0b10010001] = 0b00100101
lookup_table[0b00100101] = 0b10010001

Of course there is no lookup table encryption that isn't easily broken using frequency analysis. But as you may know, encryption using a one-time pad is unbreakable, as long as the pad is safe. This modified script encrypts or decrypts using a one-time pad (which you'll have to create yourself, store to a file, and somehow (there's the rub) securely transmit to the intended recipient of the message):

data = np.fromfile(sys.argv[1], dtype="uint8")
pad = np.fromfile(sys.argv[2], dtype="uint8")
(data ^ pad[:len(data)]).tofile(sys.argv[3])

Example usage (linux):

$ dd if=/dev/urandom of=pad.bin bs=512 count=5
$ python pytrans.py pytrans.py pad.bin encrypted.bin

Recipient then does:

$ python pytrans.py encrypted.bin pad.bin decrypted.py

Viola! Fast and unbreakable encryption with three lines (plus two import lines) in python.

来源：https://stackoverflow.com/questions/9119322/python-trying-to-deal-with-the-bits-of-a-file

标签

python

encryption

bits