Pandas how to read sub headers

随声附和 提交于 2021-02-18 13:52:56

问题


I'm using python+pandas to process a csv file.

The csv file has multiple headers, like

       Header1                     Header2
Date   Subheader1-1 Subheader1-2   Subheader2-1 Subheader2-2

And in raw text format, the csv file content looks like

,Header1,,Header2,,...
Date,Subheader1-1,Subheader1-2,Subheader2-1,Subheader2-2,...
...

My question is,

Does Pandas support this sub-header format? If not, is there a way to read this csv into pandas dataframe and do some calculation on it?

(The calculation is like extracting Header1's Subheader1-2 column, calculate average and STD, and plot everything using matplotlib.)


回答1:


Use parameter header=[0,1], but then next processing is necessary - replace Unnamed columns to NaN and then by forward filling:

import pandas as pd

temp=u''',Header1,,Header2,
Date,Subheader1-1,Subheader1-2,Subheader2-1,Subheader2-2
2018-01-02,10,2,5,6'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=[0,1])
print (df) 
  Unnamed: 0_level_0      Header1 Unnamed: 2_level_0      Header2  \
                Date Subheader1-1       Subheader1-2 Subheader2-1   
0         2018-01-02           10                  2            5   

  Unnamed: 4_level_0  
        Subheader2-2  
0                  6 

a = df.columns.get_level_values(0).to_series()
b = a.mask(a.str.startswith('Unnamed')).ffill().fillna('')
df.columns = [b, df.columns.get_level_values(1)]
print (df)
                   Header1                   Header2             
         Date Subheader1-1 Subheader1-2 Subheader2-1 Subheader2-2
0  2018-01-02           10            2            5            6

Another better solution is create index by first column:

import pandas as pd

temp=u''',Header1,,Header2,
Date,Subheader1-1,Subheader1-2,Subheader2-1,Subheader2-2
2018-01-02,10,2,5,6'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=[0,1], index_col=[0])
print (df) 
                Header1 Unnamed: 2_level_0      Header2 Unnamed: 4_level_0
Date       Subheader1-1       Subheader1-2 Subheader2-1       Subheader2-2
2018-01-02           10                  2            5                  6

a = df.columns.get_level_values(0).to_series()
b = a.mask(a.str.startswith('Unnamed')).ffill().fillna('')
df.columns = [b, df.columns.get_level_values(1)]
print (df)
                Header1                   Header2             
Date       Subheader1-1 Subheader1-2 Subheader2-1 Subheader2-2
2018-01-02           10            2            5            6


来源:https://stackoverflow.com/questions/51871136/pandas-how-to-read-sub-headers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!