Pandas Dataframe add header without replacing current header

后端 未结 2 1879
予麋鹿
予麋鹿 2020-12-29 13:26

How can I add a header to a DF without replacing the current one? In other words I just want to shift the current header down and just add it to the dataframe as another rec

相关标签:
2条回答
  • 2020-12-29 14:02

    The key is to specify header=None and use column to add header:

    data = pd.read_csv('file.csv', skiprows=2, header=None ) # skip blank rows if applicable
    df = pd.DataFrame(data)
    df = df.iloc[ : , [0,1]] # columns 1 and 2
    df.columns = ['A','B'] # title
    
    0 讨论(0)
  • 2020-12-29 14:04

    Another option is to add it as an additional level of the column index, to make it a MultiIndex:

    In [11]: df = pd.DataFrame(randn(2, 2), columns=['A', 'B'])
    
    In [12]: df
    Out[12]: 
              A         B
    0 -0.952928 -0.624646
    1 -1.020950 -0.883333
    
    In [13]: df.columns = pd.MultiIndex.from_tuples(zip(['AA', 'BB'], df.columns))
    
    In [14]: df
    Out[14]: 
             AA        BB
              A         B
    0 -0.952928 -0.624646
    1 -1.020950 -0.883333
    

    This has the benefit of keeping the correct dtypes for the DataFrame, so you can still do fast and correct calculations on your DataFrame, and allows you to access by both the old and new column names.

    .

    For completeness, here's DSM's (deleted answer), making the columns a row, which, as mentioned already, is usually not a good idea:

    In [21]: df_bad_idea = df.T.reset_index().T
    
    In [22]: df_bad_idea
    Out[22]: 
                  0         1
    index         A         B
    0     -0.952928 -0.624646
    1      -1.02095 -0.883333
    

    Note, the dtype may change (if these are column names rather than proper values) as in this case... so be careful if you actually plan to do any work on this as it will likely be slower and may even fail:

    In [23]: df.sum()
    Out[23]: 
    A   -1.973878
    B   -1.507979
    dtype: float64
    
    In [24]: df_bad_idea.sum()  # doh!
    Out[24]: Series([], dtype: float64)
    

    If the column names are actually a row that was mistaken as a header row then you should correct this on reading in the data (e.g. read_csv use header=None).

    0 讨论(0)
提交回复
热议问题