I don\'t want to use OS commands as that makes it is OS dependent.
This is available in tarfile
, tarfile.is_tarfile(filename)
, to check if
UPDATE:
i would strongly recommend to upgrade to pandas 0.18.1 (currently the latest version), as each new version of pandas introduces nice new features and fixes tons of old bugs. And the actual version (0.18.1) will process your empty files just out of the box (see demo below).
If you can't upgrade to a newer version, then make use of @MartijnPieters recommendation - catch the exception, instead of checking (follow the Easier to ask for forgiveness than permission paradigm)
OLD answer: a small demonstration (using pandas 0.18.1), which tolerates empty files, different number of columns, etc.
I tried to reproduce your error (trying empty CSV.gz, different number of columns, etc.), but i didn't manage to reproduce your exception using pandas v. 0.18.1:
import os
import glob
import gzip
import pandas as pd
fmask = 'd:/temp/.data/37874936/*.csv.gz'
files = glob.glob(fmask)
cols = ['a','b','c']
for f in files:
# actually there is no need to use `compression='gzip'` - pandas will guess it itself
# i left it in order to be sure that we are using the same parameters ...
df = pd.read_csv(f, header=None, names=cols, compression='gzip', sep=',')
print('\nFILE: [{:^40}]'.format(f))
print('{:-^60}'.format(' ORIGINAL contents '))
print(gzip.open(f, 'rt').read())
print('{:-^60}'.format(' parsed DF '))
print(df)
Output:
FILE: [ d:/temp/.data/37874936\1.csv.gz ]
-------------------- ORIGINAL contents ---------------------
11,12,13
14,15,16
------------------------ parsed DF -------------------------
a b c
0 11 12 13
1 14 15 16
FILE: [ d:/temp/.data/37874936\empty.csv.gz ]
-------------------- ORIGINAL contents ---------------------
------------------------ parsed DF -------------------------
Empty DataFrame
Columns: [a, b, c]
Index: []
FILE: [d:/temp/.data/37874936\zz_5_columns.csv.gz]
-------------------- ORIGINAL contents ---------------------
1,2,3,4,5
11,22,33,44,55
------------------------ parsed DF -------------------------
a b c
1 2 3 4 5
11 22 33 44 55
FILE: [d:/temp/.data/37874936\z_bad_CSV.csv.gz ]
-------------------- ORIGINAL contents ---------------------
1
5,6,7
1,2
8,9,10,5,6
------------------------ parsed DF -------------------------
a b c
0 1 NaN NaN
1 5 6.0 7.0
2 1 2.0 NaN
3 8 9.0 10.0
FILE: [d:/temp/.data/37874936\z_single_column.csv.gz]
-------------------- ORIGINAL contents ---------------------
1
2
3
------------------------ parsed DF -------------------------
a b c
0 1 NaN NaN
1 2 NaN NaN
2 3 NaN NaN
Can you post a sample CSV, causing this error or upload it somewhere and post here a link?