Encrypting a file with RSA in Python

前端 未结 1 1947
攒了一身酷
攒了一身酷 2020-12-23 10:43

I\'m implementing file encryption with RSA, using PyCrypto.

I know it\'s somewhat wrong, first of all because RSA is very slow and second because PyCrypto RSA can on

相关标签:
1条回答
  • 2020-12-23 11:25

    Public-key cryptography is usually used for small amounts of data only. It is slow, and can be hard to use right. The usual practice is to use other methods to reduce the asymmetric problem to one where the security is provided by a shared key, then use public-key cryptography to protect that shared key. For example:

    • To encrypt a file, randomly generate a secret key for a block or stream cipher (e.g. AES). Store the data encrypted with this cipher, and store the secret key encrypted with the public key alongside the encrypted payload.
    • To sign a file, compute a cryptographic digest (e.g. SHA-256). Sign the digest of the file with the private key and store that alongside the file.

    So here's a sketch of how encryption can look like (warning, untested code, typed directly in the browser):

    import os
    from Crypto.Cipher import AES
    from Crypto.PublicKey import RSA
    import Crypto.Util.number
    def encrypt_file(rsa, input, output):
        # Generate secret key
        secret_key = os.urandom(16)
        # Padding (see explanations below)
        plaintext_length = (Crypto.Util.number.size(rsa.n) - 2) / 8
        padding = '\xff' + os.urandom(16)
        padding += '\0' * (plaintext_length - len(padding) - len(secret_key))
        # Encrypt the secret key with RSA
        encrypted_secret_key = rsa.encrypt(padding + secret_key, None)
        # Write out the encrypted secret key, preceded by a length indication
        output.write(str(len(encrypted_secret_key)) + '\n')
        output.write(encrypted_secret_key)
        # Encrypt the file (see below regarding iv)
        iv = '\x00' * 16
        aes_engine = AES.new(secret_key, AES.MODE_CBC, iv)
        output.write(aes_engine.encrypt(input.read()))
    

    The iv is an initialization vector for the CBC mode of operation. It needs to be unique per key per message. Normally, it's sent alongside the data in cleartext. Here, since the key is only ever used once, you can use a known IV.

    The API of the block cipher is described in PEP 272. Unfortunately, it only supports all-at-once encryption. For large files, it would be better to encrypt chunk by chunk; you can encrypt as little as a block at a time (16 bytes for AES), but you need a better crypto library for that.

    Note that in general, you should not directly encrypt data with RSA. The most obvious concern is that the attacker knows the public key and can therefore attempt to guess the plaintext (if the attacker thinks the plaintext may be swordfish, then the attacker can encrypt swordfish with the RSA public key, and compare the result with the output of the RSA encryption). Another concern which would arise if you wanted to send the file to multiple recipients is that if the RSA encryption step is deterministic, then the attacker can tell that the plaintexts are the same because the ciphertexts are the same. The normal defense against these problems is to use a padding scheme, which consists of adding some random secret data to the plaintext; this data is called padding. The attacker then cannot guess the random data, and sees different outcomes for every encryption because the same plaintext is never encrypted twice; as far as the legitimate recipient is concerned, the padding is just data that can be thrown away.

    Here, it may appear that the above concerns do not apply in this scenario. However, there are other weaknesses that can arise from using RSA unprotected. In particular, if the public exponent is very small (not the case here as PyCrypto uses 65537) or you encrypt the same material for many different recipients (again, probably not the case here since each message has its own secret key), then a simple mathematical calculation would allow the attacker to recover the RSA plaintext. To avoid this attack, the value that is encrypted with RSA needs to be “close enough” to the RSA modulus, so that the encryption operation actually performs a modular exponentiation. The padding I propose ensures that by making the highest-order byte that fits 0xff; this is believed to be safe, although in the real world you should used an approved padding mode (OAEP).

    0 讨论(0)
提交回复
热议问题