问题
I've built a Python Steganographer and am trying to add a GUI to it. After my previous question regarding reading all kinds of files in Python. Since, the steganographer can only encode bytes in image. I want to add support to directly encode a file of any extension and encoding in it. For this, I am reading the file in binary and trying to encode it. It works fine for files which basically contains plain-text UTF-8 because it can easily encode .txt
and .py
files.
My updated code is:
from PIL import Image
import os
class StringTooLongException(Exception):
pass
class InvalidBitValueException(Exception):
pass
def str2bin(message):
binary = bin(int.from_bytes(message, 'big'))
return binary[2:]
def bin2str(binary):
n = int(binary, 2)
return n.to_bytes((n.bit_length() + 7) // 8, 'big')
def hide(filename, message, bits=2):
image = Image.open(filename)
binary = str2bin(message) + '00000000'
if (len(binary)) % 8 != 0:
binary = '0'*(8 - ((len(binary)) % 8)) + binary
data = list(image.getdata())
newData = []
if len(data) * bits < len(binary):
raise StringTooLongException
if bits > 8:
raise InvalidBitValueException
index = 0
for pixel in data:
if index < len(binary):
pixel = list(pixel)
pixel[0] >>= bits
pixel[0] <<= bits
pixel[0] += int('0b' + binary[index:index+bits], 2)
pixel = tuple(pixel)
index += bits
newData.append(pixel)
image.putdata(newData)
image.save(os.path.dirname(filename) + '/coded-'+os.path.basename(filename), 'PNG')
return len(binary)
def unhide(filename, bits=2):
image = Image.open(filename)
data = image.getdata()
if bits > 8:
raise InvalidBitValueException
binary = ''
index = 0
while not (len(binary) % 8 == 0 and binary[-8:] == '00000000'):
value = '00000000' + bin(data[index][0])[2:]
binary += value[-bits:]
index += 1
message = bin2str(binary)
return message
Now, the problem comes when I try to hide .pdf
or .docx
files in it. Several things are happening:
1) Microsoft Word or Adobe Acrobat shows that the file is corrupt.
2)The file size is considerable reduced from 40KB to 3KB which is a clear sign of error.
I think that the reason behind this could be that the file contains a NULL character reading which my program does not read further. Do you have any alternative idea for it?
I have an idea to change the ending byte but it may still have the same result as a file may contain that byte.
Thanks, again!
回答1:
You can use and end-of-stream (EOS) marker when you are certain the marker sequence will not show up in your message stream. When you can't guarantee that, you have two options:
- create a more complicated EOS marker, comprised of many bytes. This can be quite the nuisance to prove the same problem won't arise as before, or
- Add a header at the beginning of your message, which encodes how many bits/bytes to read for the complete message extraction.
Generally, I'd use a header whenever I have information beforehand that I want to transmit and only rely on EOS markers when I don't know when my byte stream will terminate, e.g., on-the-fly compression.
For embedding, you should aim to:
- get your binary string
- measure its length
- convert that integer to a binary of fixed size, say, 32 bits
- attach that bitstring in front of your message bitstring
- embed all of that to your cover medium
And for extraction:
- extract the first 32 bits
- convert those to an integer to get your message bitstring length
- start from index 32 and extract the neccessary number of bits
- convert back to a bytestream and save to a file
As a bonus, you can add all sorts of information to your header, e.g., the name of the original file. As long as it's all encoded in a way you can extract it later. For example.
header = 4 bytes for the length of the message string +
1 byte for the number of characters in the filename +
that many bytes for the filename
来源:https://stackoverflow.com/questions/44484791/python-steganographer-file-handling-error-for-non-plain-text-files