python: use CSV reader with single file extracted from tarfile

元气小坏坏 提交于 2020-04-16 05:49:11

问题


I am trying to use the Python CSV reader to read a CSV file that I extract from a .tar.gz file using Python's tarfile library.

I have this:

tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
    tarredCSV = tarFile.extractfile(file)
    reader = csv.reader(tarredCSV)
    next(reader)    # skip header
    for row in reader:
        if row[3] not in CSVRows.values():
            CSVRows[row[3]] = row

All the files in the tar file are all CSVs.

I am getting an exception on the first file. I am getting this exception on the first next line:

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

How do I open said file (without extracting the file then opening it)?


回答1:


tarfile.extractfile returns an io.BufferedReader object, a bytes stream, and yet csv.reader expects a text stream. You can use io.TextIOWrapper to convert the bytes stream to a text stream instead:

import io

...

reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))



回答2:


You need to provide a file-like object to csv.reader.

Probably the best solution, without having to consume a complete file at once is this approach (thanks to blhsing and damon for suggesting it):

import csv
import io
import tarfile

tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
    csv_file = io.TextIOWrapper(tarFile.extractfile(file), encoding="utf-8")

    reader = csv.reader(csv_file)
    next(reader)    # skip header
    for row in reader:
        print(row)

Alternatively a possible solution from here: Python3 working with csv files in tar files would be


import csv
import io
import tarfile

tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
    csv_file = io.StringIO(tarFile.extractfile(file).read().decode('utf-8'))

    reader = csv.reader(csv_file)
    next(reader)    # skip header
    for row in reader:
        print(row)

Here a io.StringIO object is used to make csv.reader happy. However, this might not scale well for larger files contained in the tar as each file is read in one single step.



来源:https://stackoverflow.com/questions/61069941/python-use-csv-reader-with-single-file-extracted-from-tarfile

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!