What is the easiest way to create a DataFrame
with hierarchical columns?
I am currently creating a DataFrame from a dict of names -> Series
I know the question is really old but for pandas
version 0.19.1
one can use direct dict-initialization:
d = {('a','b'):[1,2,3,4], ('a','c'):[5,6,7,8]}
df = pd.DataFrame(d, index=['r1','r2','r3','r4'])
df.columns.names = ('l1','l2')
print df
l1 a
l2 b c
r1 1 5
r2 2 6
r3 3 7
r4 4 8
Im not sure but i think the use of a dict as input for your DF and a MulitIndex dont play well together. Using an array as input instead makes it work.
I often prefer dicts as input though, one way is to set the columns after creating the df:
import pandas as pd
data = {'a': [1,2,3,4], 'b': [10,20,30,40],'c': [100,200,300,400]}
df = pd.DataFrame(np.array(data.values()).T, index=['r1','r2','r3','r4'])
tups = zip(*[['Estimates']*len(data),data.keys()])
df.columns = pd.MultiIndex.from_tuples(tups, names=['l1','l2'])
l1 Estimates
l2 a c b
r1 1 10 100
r2 2 20 200
r3 3 30 300
r4 4 40 400
Or when using an array as input for the df:
data_arr = np.array([[1,2,3,4],[10,20,30,40],[100,200,300,400]])
tups = zip(*[['Estimates']*data_arr.shape[0],['a','b','c'])
df = pd.DataFrame(data_arr.T, index=['r1','r2','r3','r4'], columns=pd.MultiIndex.from_tuples(tups, names=['l1','l2']))
Which gives the same result.
This appears to work:
import pandas as pd
data = {'a': [1,2,3,4], 'b': [10,20,30,40],'c': [100,200,300,400]}
df = pd.concat({"Estimates": pd.DataFrame(data)}, axis=1, names=["l1", "l2"])
l1 Estimates
l2 a b c
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400