问题
I am trying to use the Python CSV reader to read a CSV file that I extract from a .tar.gz
file using Python's tarfile library.
I have this:
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
tarredCSV = tarFile.extractfile(file)
reader = csv.reader(tarredCSV)
next(reader) # skip header
for row in reader:
if row[3] not in CSVRows.values():
CSVRows[row[3]] = row
All the files in the tar file are all CSVs.
I am getting an exception on the first file. I am getting this exception on the first next
line:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
How do I open said file (without extracting the file then opening it)?
回答1:
tarfile.extractfile
returns an io.BufferedReader
object, a bytes stream, and yet csv.reader
expects a text stream. You can use io.TextIOWrapper
to convert the bytes stream to a text stream instead:
import io
...
reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))
回答2:
You need to provide a file-like object to csv.reader
.
Probably the best solution, without having to consume a complete file at once is this approach (thanks to blhsing and damon for suggesting it):
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.TextIOWrapper(tarFile.extractfile(file), encoding="utf-8")
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Alternatively a possible solution from here: Python3 working with csv files in tar files would be
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.StringIO(tarFile.extractfile(file).read().decode('utf-8'))
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Here a io.StringIO
object is used to make csv.reader
happy. However, this might not scale well for larger files contained in the tar as each file is read in one single step.
来源:https://stackoverflow.com/questions/61069941/python-use-csv-reader-with-single-file-extracted-from-tarfile