I am trying to clean up a Excel file for some further research. Problem that I have, I want to merge the first and second row. The code which I have now:
xl
I think you need numpy.concatenate, similar principe like cᴏʟᴅsᴘᴇᴇᴅ answer:
df.columns = np.concatenate([df.iloc[0, :2], df.columns[2:]])
df = df.iloc[1:].reset_index(drop=True)
print (df)
Sample type Concentration A B C D E F \
0 Water 9200 95.5 21.0 6.0 11.942308 64.134615 21.498560
1 Water 9200 94.5 17.0 5.0 5.484615 63.205769 19.658560
2 Water 9200 92.0 16.0 3.0 11.057692 62.586538 19.813120
3 Water 4600 53.0 7.5 2.5 3.538462 35.163462 6.876207
G H
0 5.567840 1.174135
1 4.968000 1.883444
2 5.192480 0.564835
3 1.641724 0.144654
Fetch the all columns present in Second row header then First row header. combine them to make a "all columns name header" list. now create a df with excel by taking header as header[0,1]. now replace its headers with all column name headers you created previously.
import pandas as pd
#reading Second header row columns
df1 = pd.read_excel('nanonose.xls', header=[1] , index = False)
cols1 = df1.columns.tolist()
SecondRowColumns = []
for c in cols1:
if ("Unnamed" or "NaN" not in c):
SecondRowColumns.append(c)
#reading First header row columns
df2 = pd.read_excel('nanonose.xls', header=[0] , index = False)
cols2 = df2.columns.tolist()
FirstRowColumns = []
for c in cols2:
if ("Unnamed" or "Nanonose" not in c):
FirstRowColumns.append(c)
AllColumn = []
AllColumn = SecondRowColumns+ FirstRowColumns
df = pd.read_excel('nanonose.xls', header=[0,1] , index=False)
df.columns = AllColumn
print(df)
Just reassign df.columns
.
df.columns = np.append(df.iloc[0, :2], df.columns[2:])
Or,
df.columns = df.iloc[0, :2].tolist() + (df.columns[2:]).tolist()
Next, skip the first row.
df = df.iloc[1:].reset_index(drop=True)
df
Sample type Concentration A B C D E F \
0 Water 9200 95.5 21.0 6.0 11.942308 64.134615 21.498560
1 Water 9200 94.5 17.0 5.0 5.484615 63.205769 19.658560
2 Water 9200 92.0 16.0 3.0 11.057692 62.586538 19.813120
3 Water 4600 53.0 7.5 2.5 3.538462 35.163462 6.876207
G H
0 5.567840 1.174135
1 4.968000 1.883444
2 5.192480 0.564835
3 1.641724 0.144654
reset_index
is optional if you want a 0-index for your final output.