Read multiple csv files zipped in one file

后端 未结 2 1142
独厮守ぢ
独厮守ぢ 2021-01-29 09:41

I have several csv files in several zip files in on folder, so for example:

  • A.zip (contains csv1,csv2,csv3)
  • B.zip (contains csv4, csv5, csv6)
相关标签:
2条回答
  • 2021-01-29 10:04

    Use zip.namelist() to get list of files inside the zip

    Ex:

    import glob
    import zipfile
    import pandas as pd
    
    for zip_file in glob.glob("C/folder/*.zip"):
        zf = zipfile.ZipFile(zip_file)
        dfs = [pd.read_csv(zf.open(f), header=None, sep=";") for f in zf.namelist()]
        df = pd.concat(dfs,ignore_index=True)
        print(df)
    
    0 讨论(0)
  • 2021-01-29 10:17

    I would try to tackle it in two passes. First pass, extract the contents of the zipfile onto the filesystem. Second Pass, read all those extracted CSVs using the method you already have above:

    import glob
    import pandas as pd
    import zipfile
    
    def extract_files(file_path):
      archive = zipfile.ZipFile(file_path, 'r') 
      unzipped_path = archive.extractall()
      return unzipped_path
    
    zipped_files = glob.glob("C/folder/*.zip")]
    file_paths = [extract_files(zf) for zf in zipped_files]
    
    dfs = [pd.read_csv(f, header=None, sep=";") for f in file_paths]
    df = pd.concat(dfs,ignore_index=True)
    
    0 讨论(0)
提交回复
热议问题