Pandas. How to read Excel file from ZIP archive

只谈情不闲聊 提交于 2021-02-20 06:18:16

问题


I have .zip archive with filename.xlsx inside it and I want to parse Excel sheet line by line.

How to proper pass filename into pandas.read_excel in this case?

I tried:

import zipfile
import pandas
myzip=zipfile.ZipFile(filename.zip)
for fname in myzip.namelist():
    with myzip.open(fname) as from_archive:
        with pandas.read_excel(from_archive) as fin:
            for line in fin:
            ....

but it doesn't seem to work, and the result was:

AttributeError: __exit__

回答1:


You can extract your zip-file into a variable in memory and parse it using io.BytesIO:

import io
from zipfile import ZipFile
import pandas as pd


def read_zip(zip_fn, extract_fn=None):
    zf = ZipFile(zip_fn)
    if extract_fn:
        return zf.read(extract_fn)
    else:
        return {name:zf.read(name) for name in zf.namelist()}

Usage:

df = pd.read_excel(io.BytesIO(read_zip(r'C:\download\test.xlsx.zip', 'test.xlsx')))

Alternatively you can extract files from the zip-file to disk and parse them as a regular files.

PS there are tons of examples on StackOverflow, showing how to explode zip-file...




回答2:


Using zipfile

import zipfile

archive = zipfile.ZipFile('filename.zip', 'r')
xlfile = archive.open('filename.xlsx')
df = pd.read_excel(xlfile)


来源:https://stackoverflow.com/questions/49157077/pandas-how-to-read-excel-file-from-zip-archive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!