reading gzipped csv file in python 3

问题

I'm having problems reading from a gzipped csv file with the gzip and csv libs. Here's what I got:

import gzip
import csv
import json

f = gzip.open(filename)
csvobj = csv.reader(f,delimiter = ',',quotechar="'")
for line in csvobj:
            ts = line[0]
            data_json = json.loads(line[1])

but this throws an exception:

 File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 64, in download_from_S3
    self.parse_dump_file(filename)
  File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 30, in parse_dump_file
    for line in csvobj:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

gunzipping the file and opening that with csv works fine. I've also tried decoding the file text to convert from bytes to str...

What am I missing here?

回答1:

Default mode for gzip.open is rb, if you wish to work with strs, you have to specify it extra:

f = gzip.open(filename, mode="rt")

OT: it is a good practice to write I/O operations in a with block:

with gzip.open(filename, mode="rt") as f:

回答2:

You are opening the file in binary mode (which is the default for gzip).

Try instead:

import gzip
import csv
f = gzip.open(filename, mode='rt')
csvobj = csv.reader(f,delimiter = ',',quotechar="'")

回答3:

too late, you can use datatable package in python

import datatable as dt
df = dt.fread(filename)
df.head()

来源：https://stackoverflow.com/questions/30324503/reading-gzipped-csv-file-in-python-3

标签

python

csv

gzip