I have a CSV file and I want to check if the first row has only strings in it (ie a header). I\'m trying to avoid using any extras like pandas etc. I\'m thinking I\'ll use a
Here is a function I use with pandas in order analyze whether header
should be set to 'infer'
or None
:
def identify_header(path, n=5, th=0.9):
df1 = pd.read_csv(path, header='infer', nrows=n)
df2 = pd.read_csv(path, header=None, nrows=n)
sim = (df1.dtypes.values == df2.dtypes.values).mean()
return 'infer' if sim < th else None
Based on a small sample, the function checks the similarity of dtypes with and without a header row. If the dtypes match for a certain percentage of columns, it is assumed that there is no header present. I found a threshold of 0.9
to work well for my use cases. This function is also fairly fast as it only reads a small sample of the csv file.