How can I add a header to a DF without replacing the current one? In other words I just want to shift the current header down and just add it to the dataframe as another rec
The key is to specify header=None and use column to add header:
data = pd.read_csv('file.csv', skiprows=2, header=None ) # skip blank rows if applicable
df = pd.DataFrame(data)
df = df.iloc[ : , [0,1]] # columns 1 and 2
df.columns = ['A','B'] # title
Another option is to add it as an additional level of the column index, to make it a MultiIndex:
In [11]: df = pd.DataFrame(randn(2, 2), columns=['A', 'B'])
In [12]: df
Out[12]:
A B
0 -0.952928 -0.624646
1 -1.020950 -0.883333
In [13]: df.columns = pd.MultiIndex.from_tuples(zip(['AA', 'BB'], df.columns))
In [14]: df
Out[14]:
AA BB
A B
0 -0.952928 -0.624646
1 -1.020950 -0.883333
This has the benefit of keeping the correct dtypes for the DataFrame, so you can still do fast and correct calculations on your DataFrame, and allows you to access by both the old and new column names.
.
For completeness, here's DSM's (deleted answer), making the columns a row, which, as mentioned already, is usually not a good idea:
In [21]: df_bad_idea = df.T.reset_index().T
In [22]: df_bad_idea
Out[22]:
0 1
index A B
0 -0.952928 -0.624646
1 -1.02095 -0.883333
Note, the dtype may change (if these are column names rather than proper values) as in this case... so be careful if you actually plan to do any work on this as it will likely be slower and may even fail:
In [23]: df.sum()
Out[23]:
A -1.973878
B -1.507979
dtype: float64
In [24]: df_bad_idea.sum() # doh!
Out[24]: Series([], dtype: float64)
If the column names are actually a row that was mistaken as a header row then you should correct this on reading in the data (e.g. read_csv
use header=None
).