Multiple columns with the same name in Pandas

后端 未结 2 352
终归单人心
终归单人心 2021-01-04 13:24

I am creating a dataframe from a CSV file. I have gone through the docs, multiple SO posts, links as I have just started Pandas but didn\'t get it. The CSV file

相关标签:
2条回答
  • 2021-01-04 14:13

    I had a similar issue, not due to reading from csv, but I had multiple df columns with the same name (in my case 'id'). I solved it by taking df.columns and resetting the column names using a list.

    In : df.columns
    Out: 
    Index(['success', 'created', 'id', 'errors', 'id'], dtype='object')
    
    In : df.columns = ['success', 'created', 'id1', 'errors', 'id2']
    
    In : df.columns
    Out: 
    Index(['success', 'created', 'id1', 'errors', 'id2'], dtype='object')
    

    From here, I was able to call 'id1' or 'id2' to get just the column I wanted.

    0 讨论(0)
  • 2021-01-04 14:19

    the relevant parameter is mangle_dupe_cols

    from the docs

    mangle_dupe_cols : boolean, default True
        Duplicate columns will be specified as 'X.0'...'X.N', rather than 'X'...'X'
    

    by default, all of your 'a' columns get named 'a.0'...'a.N' as specified above.

    if you used mangle_dupe_cols=False, importing this csv would produce an error.

    you can get all of your columns with

    df.filter(like='a')
    

    demonstration

    from StringIO import StringIO
    import pandas as pd
    
    txt = """a, a, a, b, c, d
    1, 2, 3, 4, 5, 6
    7, 8, 9, 10, 11, 12"""
    
    df = pd.read_csv(StringIO(txt), skipinitialspace=True)
    df
    

    df.filter(like='a')
    

    0 讨论(0)
提交回复
热议问题