问题
I have very recently started to learn Python, and I chose to learn things by trying to solve a problem that I find interesting. This problem is to take a file (binary or not) and encrypt it using a simple method, something like replacing every "1001 0001" in it with a "0010 0101", and vice-versa.
However, I didn't find a way to do it. When reading the file, I can create an array in which each element contains one byte of data, with the read() method. But how can I replace this byte with another one, if it is one of the bytes I chose to replace, and then write the resulting information into the output encrypted file?
Thanks in advance!
回答1:
To swap bytes 10010001
and 00100101
:
#!/usr/bin/env python
import string
a, b = map(chr, [0b10010001, 0b00100101])
translation_table = string.maketrans(a+b, b+a) # swap a,b
with open('input', 'rb') as fin, open('output', 'wb') as fout:
fout.write(fin.read().translate(translation_table))
回答2:
read() returns an immutable string, so you'll first need to convert that to a list of characters. Then go through your list and change the bytes as needed, and finally join the list back into a new string to write to the output file.
filedata = f.read()
filebytes = list(filedata)
for i, c in enumerate(filebytes):
if ord(c) == 0x91:
filebytes[i] = chr(0x25)
newfiledata = ''.join(filebytes)
回答3:
Following Aaron's answer, once you have a string, then you can also use translate
or replace
:
In [43]: s = 'abc'
In [44]: s.replace('ab', 'ba')
Out[44]: 'bac'
In [45]: tbl = string.maketrans('a', 'd')
In [46]: s.translate(tbl)
Out[46]: 'dbc'
Docs: Python string.
回答4:
I'm sorry about this somewhat relevant wall of text -- I'm just in a teaching mood.
If you want to optimize such an operation, I suggest using numpy. The advantage is that the entire translation operation is done with a single numpy operation, and those are written in C, so it is about as fast as you can get it using python.
In the below example I simply XOR every byte with 0b11111111
using a lookup table -- first element is the translation of 0b0000000
, the second the translation of 0b00000001
, third 0b00000010
, and so on. By altering the lookup table, you can do any kind of translation that does not change within the file.
import numpy as np
import sys
data = np.fromfile(sys.argv[1], dtype="uint8")
lookup_table = np.array(
[i ^ 0xFF for i in range(256)], dtype="uint8")
lookup_table[data].tofile(sys.argv[2])
To highlight the simplicity of it all I've done no argument checking. Invoke script like this:
python name_of_script.py input_file.txt output_file.txt
To directly answer your question, if you want to swap 0b10010001
and 0b00100101
, you replace the lookup_table = ...
line with this:
lookup_table = np.array(range(256), dtype="uint8")
lookup_table[0b10010001] = 0b00100101
lookup_table[0b00100101] = 0b10010001
Of course there is no lookup table encryption that isn't easily broken using frequency analysis. But as you may know, encryption using a one-time pad is unbreakable, as long as the pad is safe. This modified script encrypts or decrypts using a one-time pad (which you'll have to create yourself, store to a file, and somehow (there's the rub) securely transmit to the intended recipient of the message):
data = np.fromfile(sys.argv[1], dtype="uint8")
pad = np.fromfile(sys.argv[2], dtype="uint8")
(data ^ pad[:len(data)]).tofile(sys.argv[3])
Example usage (linux):
$ dd if=/dev/urandom of=pad.bin bs=512 count=5
$ python pytrans.py pytrans.py pad.bin encrypted.bin
Recipient then does:
$ python pytrans.py encrypted.bin pad.bin decrypted.py
Viola! Fast and unbreakable encryption with three lines (plus two import lines) in python.
来源:https://stackoverflow.com/questions/9119322/python-trying-to-deal-with-the-bits-of-a-file