I would like to create an empty DataFrame with a MultiIndex before assigning rows to it. I already found that empty DataFrames
Another solution which is maybe a little simpler is to use the function set_index
:
>>> import pandas as pd
>>> df = pd.DataFrame(columns=['one', 'two', 'three', 'alpha', 'beta'])
>>> df = df.set_index(['one', 'two', 'three'])
>>> df
Empty DataFrame
Columns: [alpha, beta]
Index: []
>>> df.loc[('apple','banana','cherry'),:] = [0.1, 0.2]
>>> df
alpha beta
one two three
apple banana cherry 0.1 0.2
The solution is to leave out the labels. This works fine for me:
>>> my_index = pd.MultiIndex(levels=[[],[],[]],
labels=[[],[],[]],
names=[u'one', u'two', u'three'])
>>> my_index
MultiIndex(levels=[[], [], []],
labels=[[], [], []],
names=[u'one', u'two', u'three'])
>>> my_columns = [u'alpha', u'beta']
>>> df = pd.DataFrame(index=my_index, columns=my_columns)
>>> df
Empty DataFrame
Columns: [alpha, beta]
Index: []
>>> df.loc[('apple','banana','cherry'),:] = [0.1, 0.2]
>>> df
alpha beta
one two three
apple banana cherry 0.1 0.2
Hope that helps!
Using pd.MultiIndex.from_tuples may be more straightforward.
import pandas as pd
ind = pd.MultiIndex.from_tuples([], names=(u'one', u'two', u'three'))
df = pd.DataFrame(columns=['alpha', 'beta'], index=ind)
df.loc[('apple','banana','cherry'), :] = [4, 3]
df
alpha beta
one two three
apple banana cherry 4 3
Using pd.MultiIndex.from_arrays
allows for a slightly more concise solution when defining the index explicitly:
import pandas as pd
ind = pd.MultiIndex.from_arrays([[]] * 3, names=(u'one', u'two', u'three'))
df = pd.DataFrame(columns=['alpha', 'beta'], index=ind)
df.loc[('apple','banana','cherry'), :] = [4, 3]
alpha beta
one two three
apple banana cherry 4 3