Open an lzo file in python, without decompressing the file

人盡茶涼 提交于 2021-01-27 07:46:47

问题


I'm currently working on a 3rd year project involving data from Twitter. The department have provided me with .lzo's of a months worth of Twitter. The smallest is 4.9gb and when decompressed is 29gb so I'm trying to open the file and read as I'm going. Is this possible or do I need to decompress and work with the data that way?

EDIT: Have attempted to read it line by line and decompress the read line

UPDATE: Found a solution - reading the STDOUT of lzop -dc works like a charm


回答1:


How about starting an lzop binary in a subprocess with -c switch and then read its STDOUT line by line?




回答2:


I know only one library for LZO with Python — https://github.com/jd-boyd/python-lzo and it requires full decompression (moreover — it decompress contents in memory).

So I think you'll need to decompress files before work with them.




回答3:


I know this is a very old question and the answer is really good. I enchountered a samilar problem, google brought me here.

I just write down my experience on lzo compression and lzop program. Hope I can help someone like me encounter the same quesion. And I write a simple python module to deal with lzo file, you can find it on https://github.com/ir193/python-lzo/

Regarding the quesion, reading lzo compressed file in place (without decompress the whole file) can be done by reading one block at one time. The lzo file is divided into serveral blocks and there is a maximum size of the block about serveral MB. In my module, you can just using read(4096) or so.

Actually *.lzo is created by lzop and has little to do with the python-lzo provided by another answer (https://github.com/jd-boyd/python-lzo). This module is used for compress/decompress string, not handle lzop file header and checksum. Don't use it if you want to open some exist lzo file.



来源:https://stackoverflow.com/questions/13415970/open-an-lzo-file-in-python-without-decompressing-the-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!