I\'m implementing file encryption with RSA, using PyCrypto.
I know it\'s somewhat wrong, first of all because RSA is very slow and second because PyCrypto RSA can on
Public-key cryptography is usually used for small amounts of data only. It is slow, and can be hard to use right. The usual practice is to use other methods to reduce the asymmetric problem to one where the security is provided by a shared key, then use public-key cryptography to protect that shared key. For example:
So here's a sketch of how encryption can look like (warning, untested code, typed directly in the browser):
import os
from Crypto.Cipher import AES
from Crypto.PublicKey import RSA
import Crypto.Util.number
def encrypt_file(rsa, input, output):
# Generate secret key
secret_key = os.urandom(16)
# Padding (see explanations below)
plaintext_length = (Crypto.Util.number.size(rsa.n) - 2) / 8
padding = '\xff' + os.urandom(16)
padding += '\0' * (plaintext_length - len(padding) - len(secret_key))
# Encrypt the secret key with RSA
encrypted_secret_key = rsa.encrypt(padding + secret_key, None)
# Write out the encrypted secret key, preceded by a length indication
output.write(str(len(encrypted_secret_key)) + '\n')
output.write(encrypted_secret_key)
# Encrypt the file (see below regarding iv)
iv = '\x00' * 16
aes_engine = AES.new(secret_key, AES.MODE_CBC, iv)
output.write(aes_engine.encrypt(input.read()))
The iv
is an initialization vector for the CBC mode of operation. It needs to be unique per key per message. Normally, it's sent alongside the data in cleartext. Here, since the key is only ever used once, you can use a known IV.
The API of the block cipher is described in PEP 272. Unfortunately, it only supports all-at-once encryption. For large files, it would be better to encrypt chunk by chunk; you can encrypt as little as a block at a time (16 bytes for AES), but you need a better crypto library for that.
Note that in general, you should not directly encrypt data with RSA. The most obvious concern is that the attacker knows the public key and can therefore attempt to guess the plaintext (if the attacker thinks the plaintext may be swordfish
, then the attacker can encrypt swordfish
with the RSA public key, and compare the result with the output of the RSA encryption). Another concern which would arise if you wanted to send the file to multiple recipients is that if the RSA encryption step is deterministic, then the attacker can tell that the plaintexts are the same because the ciphertexts are the same. The normal defense against these problems is to use a padding scheme, which consists of adding some random secret data to the plaintext; this data is called padding. The attacker then cannot guess the random data, and sees different outcomes for every encryption because the same plaintext is never encrypted twice; as far as the legitimate recipient is concerned, the padding is just data that can be thrown away.
Here, it may appear that the above concerns do not apply in this scenario. However, there are other weaknesses that can arise from using RSA unprotected. In particular, if the public exponent is very small (not the case here as PyCrypto uses 65537) or you encrypt the same material for many different recipients (again, probably not the case here since each message has its own secret key), then a simple mathematical calculation would allow the attacker to recover the RSA plaintext. To avoid this attack, the value that is encrypted with RSA needs to be “close enough” to the RSA modulus, so that the encryption operation actually performs a modular exponentiation. The padding I propose ensures that by making the highest-order byte that fits 0xff; this is believed to be safe, although in the real world you should used an approved padding mode (OAEP).