How to check empty gzip file in Python

后端 未结 8 855
太阳男子
太阳男子 2021-01-12 01:46

I don\'t want to use OS commands as that makes it is OS dependent.

This is available in tarfile, tarfile.is_tarfile(filename), to check if

8条回答
  •  一生所求
    2021-01-12 02:23

    UPDATE:

    i would strongly recommend to upgrade to pandas 0.18.1 (currently the latest version), as each new version of pandas introduces nice new features and fixes tons of old bugs. And the actual version (0.18.1) will process your empty files just out of the box (see demo below).

    If you can't upgrade to a newer version, then make use of @MartijnPieters recommendation - catch the exception, instead of checking (follow the Easier to ask for forgiveness than permission paradigm)

    OLD answer: a small demonstration (using pandas 0.18.1), which tolerates empty files, different number of columns, etc.

    I tried to reproduce your error (trying empty CSV.gz, different number of columns, etc.), but i didn't manage to reproduce your exception using pandas v. 0.18.1:

    import os
    import glob
    import gzip
    import pandas as pd
    
    fmask = 'd:/temp/.data/37874936/*.csv.gz'
    
    files = glob.glob(fmask)
    
    cols = ['a','b','c']
    
    for f in files:
        # actually there is no need to use `compression='gzip'` - pandas will guess it itself
        # i left it in order to be sure that we are using the same parameters ...
        df = pd.read_csv(f, header=None, names=cols, compression='gzip', sep=',')
        print('\nFILE: [{:^40}]'.format(f))
        print('{:-^60}'.format(' ORIGINAL contents '))
        print(gzip.open(f, 'rt').read())
        print('{:-^60}'.format(' parsed DF '))
        print(df) 
    

    Output:

    FILE: [    d:/temp/.data/37874936\1.csv.gz     ]
    -------------------- ORIGINAL contents ---------------------
    11,12,13
    14,15,16
    
    
    ------------------------ parsed DF -------------------------
        a   b   c
    0  11  12  13
    1  14  15  16
    
    FILE: [  d:/temp/.data/37874936\empty.csv.gz   ]
    -------------------- ORIGINAL contents ---------------------
    
    ------------------------ parsed DF -------------------------
    Empty DataFrame
    Columns: [a, b, c]
    Index: []
    
    FILE: [d:/temp/.data/37874936\zz_5_columns.csv.gz]
    -------------------- ORIGINAL contents ---------------------
    1,2,3,4,5
    11,22,33,44,55
    
    ------------------------ parsed DF -------------------------
            a   b   c
    1  2    3   4   5
    11 22  33  44  55
    
    FILE: [d:/temp/.data/37874936\z_bad_CSV.csv.gz ]
    -------------------- ORIGINAL contents ---------------------
    1
    5,6,7
    1,2
    8,9,10,5,6
    
    ------------------------ parsed DF -------------------------
       a    b     c
    0  1  NaN   NaN
    1  5  6.0   7.0
    2  1  2.0   NaN
    3  8  9.0  10.0
    
    FILE: [d:/temp/.data/37874936\z_single_column.csv.gz]
    -------------------- ORIGINAL contents ---------------------
    1
    2
    3
    
    ------------------------ parsed DF -------------------------
       a   b   c
    0  1 NaN NaN
    1  2 NaN NaN
    2  3 NaN NaN
    

    Can you post a sample CSV, causing this error or upload it somewhere and post here a link?

提交回复
热议问题