I need to extract a gz file that I have downloaded from an FTP site to a local Windows file server. I have the variables set for the local path of the file, and I know it ca
Not an exact answer because you're using xml data and there is currently no pd.read_xml()
function (as of v0.23.4), but pandas (starting with v0.21.0) can uncompress the file for you! Thanks Wes!
import pandas as pd
import os
fn = '../data/file_to_load.json.gz'
print(os.path.isfile(fn))
df = pd.read_json(fn, lines=True, compression='gzip')
df.tail()
with gzip.open('features_train.csv.gz') as f:
features_train = pd.read_csv(f)
features_train.head()
from sh import gunzip
gunzip('/tmp/file1.gz')
If you are parsing the file after unzipping it, don't forget to use decode() method, is necessary when you open a file as binary.
import gzip
with gzip.open(file.gz, 'rb') as f:
for line in f:
print(line.decode().strip())
From the documentation:
import gzip
f = gzip.open('file.txt.gz', 'rb')
file_content = f.read()
f.close()
import gzip
import shutil
with gzip.open('file.txt.gz', 'rb') as f_in:
with open('file.txt', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)