问题
I am trying to compress in snappy format a csv file using a python script and the python-snappy module. This is my code so far:
import snappy
d = snappy.compress("C:\\Users\\my_user\\Desktop\\Test\\Test_file.csv")
with open("compressed_file.snappy", 'w') as snappy_data:
snappy_data.write(d)
snappy_data.close()
This code actually creates a snappy file, but the snappy file created only contains a string: "C:\Users\my_user\Desktop\Test\Test_file.csv"
So I am a bit lost on getting my csv compressed. I got it done working on windows cmd with this command:
python -m snappy -c Test_file.csv compressed_file.snappy
But I need it to be done as a part of a python script, so working on cmd is not fine for me.
Thank you very much, Álvaro
回答1:
You are compressing the plain string, as the compress function takes raw data.
There are two ways to compress snappy data - as one block and the other as streaming (or framed) data
This function will compress a file using framed method
import snappy
def snappy_compress(path):
path_to_store = path+'.snappy'
with open(path, 'rb') as in_file:
with open(path_to_store, 'w') as out_file:
snappy.stream_compress(in_file, out_file)
out_file.close()
in_file.close()
return path_to_store
snappy_compress('testfile.csv')
You can decompress from command line using:
python -m snappy -d testfile.csv.snappy testfile_decompressed.csv
It should be noted that the current framing used by python / snappy is not compatible with the framing used by Hadoop
来源:https://stackoverflow.com/questions/45711565/how-to-snappy-compress-a-file-using-a-python-script