Pandas read_csv without knowing whether header is present

前端 未结 2 794
日久生厌
日久生厌 2021-02-14 02:31

I have an input file with known columns, let\'s say two columns Name and Sex. Sometimes it has the header line Name,Sex, and sometimes it

2条回答
  •  失恋的感觉
    2021-02-14 03:09

    I've come up with a way of detecting the header without prior knowledge of its names:

    if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
        df = df[1:].reset_index(drop=True)
    

    And by changing it slightly, it can update the current header with the detected one:

    if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
        df = df[1:].reset_index(drop=True).rename(columns=df.iloc[0])
    

    This would allow easily selecting the desired behavior:

    update_header = True
    
    if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
        new_header = df.iloc[0]
    
        df = df[1:].reset_index(drop=True)
    
        if update_header:
            df.rename(columns=new_header, inplace=True)
    

    Pros:

    • Doesn't require prior knowledge of the header's names.
    • Can be used to update the header automatically if an existing one is detected.

    Cons:

    • Won't work well if data contains strings. Replacing if any() to require all elements to be strings might help, unless data also contains entire rows of strings.

提交回复
热议问题