Python 2.7: Compressing data with the XZ format using the “lzma” module

一曲冷凌霜 提交于 2019-12-01 03:04:13

问题


I'm experimenting with the lzma module in Python 2.7.6 to see if I could create compressed files using the XZ format for a future project that will make use of it. My code used during the experiment was:

import lzma as xz

in_file = open('/home/ki2ne/Desktop/song.wav', 'rb')
input_data = in_file.read()

compressed_data = xz.compress(input_data)
out_file = open('/home/ki2ne/Desktop/song.wav.xz', 'wb')
in_file.close()
out_file.close()

and I noticed there were two different checksums (MD5 and SHA256) from the resulting file compared to when I used the plain xz (although I could decompress fine with either method - the checksums of the decompressed versions of both files were the same). Would this be a problem?

UPDATE: I found a fix for it by installing the backport (from Python 3.3) via peterjc's Git repository (link here), and now it's showing identical checksums. Not sure if it helps, but I made sure the LZMA Python module in my repository wasn't installed to avoid possible name conflicts.

Here's my test code to confirm this:

# I have created two identical text files with some random phrases

from subprocess import call
from hashlib import sha256
from backports import lzma as xz

f2 = open("test2.txt" , 'rb')
f2_buf = buffer(f2.read())
call(["xz", "test1.txt"])

f2_xzbuf = buffer(xz.compress(f2_buf))
f1 = open("test1.txt.xz", 'rb')
f1_xzbuf = buffer(f1.read())

f1.close(); f2.close()

f1sum = sha256(); f2sum = sha256()

f1sum.update(f1_xzbuf); f2sum.update(f2_xzbuf)

if f1sum.hexdigest() == f2sum.hexdigest():
    print "Checksums OK"
else:
    print "Checksum Error"

I've also verified it using the regular sha256sum as well (when I wrote the data to file).


回答1:


I would not be concerned about the differences in the compressed files - depending on the container format and the checksum type used in the .xz file, the compressed data could vary without affecting the contents.

EDIT I've been looking into this further, and wrote this script to test the PyLZMA Python2.x module and the lzma Python3.x built in module

from __future__ import print_function
try:
    import lzma as xz
except ImportError:
    import pylzma as xz
import os

# compress with xz command line util
os.system('xz -zkf test.txt')

# now compress with lib
with open('test.txt', 'rb') as f, open('test.txt.xzpy', 'wb') as out:
    out.write(xz.compress(bytes(f.read())))

# compare the two files
from hashlib import md5

with open('test.txt.xz', 'rb') as f1, open('test.txt.xzpy', 'rb') as f2:
    hash1 = md5(f1.read()).hexdigest()
    hash2 = md5(f2.read()).hexdigest() 
    print(hash1, hash2)
    assert hash1 == hash2

This compresses a file test.txt with the xz command line utility and with the Python module and compares the results. Under Python3 lzma produces the same result as xz, however under Python2 PyLZMA produces a different result that cannot be extracted using the xz command line util.

What module are you using that is called "lzma" in Python2 and what command did you use to compress the data?

EDIT 2 Okay, I found the pyliblzma module for Python2. However it seems to use CRC32 as the default checksum algorithm (others use CRC64) and there is a bug that prevents changing the checksum algorithm https://bugs.launchpad.net/pyliblzma/+bug/1243344

You could possibly try compressing using xz -C crc32 to compare the results, but I'm still not having success making a valid compressed file using the Python2 libraries.




回答2:


In my case (Ubuntu/Mint), in order to use the lzma module with Pyhton 2.7, I installed backports.lzma directly with pip (I have not used github), with sudo or root user:

pip2 install backports.lzma

FYI pip2 has the --user option that doesn't require superuser permissions and installs the module for the local user only, but I have not tested this.

First than performing the pip installation, you have also to install, with your package manager, one mandatory dependency: the library liblzma.

In my case the package names were liblzma5 andliblzma-dev but package names may differ between Linux distro/releases.

P.s: I also repeated the same operation with success with conda on a different Linux environment (Unknown cluster distro):

conda install backports
conda install backports.lzma --name pyEnvName

Hope useful



来源:https://stackoverflow.com/questions/22370068/python-2-7-compressing-data-with-the-xz-format-using-the-lzma-module

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!