问题
I have problems understanding multiindex selecting in pandas.
0 1 2 3
first second third
C one mean 3 4 2 7
std 4 1 7 7
two mean 3 1 4 7
std 5 6 7 0
three mean 7 0 2 5
std 7 3 7 1
H one mean 2 4 3 3
std 5 5 3 5
two mean 5 7 0 6
std 0 1 0 2
three mean 5 2 5 1
std 9 0 4 6
V one mean 3 7 3 9
std 8 7 9 3
two mean 1 9 9 0
std 1 1 5 1
three mean 3 1 0 6
std 6 2 7 4
I need to create new rows:
- 'CH' : ['CH',:,'mean'] => ['C',:,'mean'] - ['H',:,'mean']
- 'CH' : ['CH',:,'std'] => (['C',:,'std']**2 + ['H',:,'std']**2)**.5
When trying to select rows I get different types of errors: UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'
How should be performed this operation?
import pandas as pd
import numpy as np
iterables = [['C', 'H', 'V'],
['one','two','three'],
['mean','std']]
midx = pd.MultiIndex.from_product(iterables, names=['first', 'second','third'])
chv = pd.DataFrame(np.random.randint(0,high=10,size=(18,4)), index=midx)
print (chv)
idx = pd.IndexSlice
chv.loc[:,idx['C',:,'mean']]
回答1:
You can filter by slicers first, then rename
first level and use arithmetic operations, last concat together:
#avoid UnsortedIndexError
df = df.sort_index()
idx = pd.IndexSlice
c1 = chv.loc[idx['C',:,'mean'], :].rename({'C':'CH'}, level=0)
h1 = chv.loc[idx['H',:,'mean'], :].rename({'H':'CH'}, level=0)
ch1 = c1 - h1
c2 = chv.loc[idx['C',:,'std'], :].rename({'C':'CH'}, level=0)**2
h2 = chv.loc[idx['H',:,'std'], :].rename({'H':'CH'}, level=0)**2
ch2 = (c2 + h2)**.5
df = pd.concat([chv, ch1, ch2]).sort_index()
print (df)
0 1 2 3
first second third
C one mean 7.000000 5.000000 8.000000 3.000000
std 0.000000 4.000000 4.000000 4.000000
three mean 4.000000 2.000000 1.000000 6.000000
std 8.000000 7.000000 3.000000 3.000000
two mean 1.000000 8.000000 2.000000 5.000000
std 2.000000 2.000000 4.000000 2.000000
CH one mean 1.000000 2.000000 1.000000 2.000000
std 4.000000 7.211103 4.000000 7.211103
three mean 1.000000 0.000000 -4.000000 2.000000
std 8.062258 7.071068 4.242641 3.000000
two mean -1.000000 6.000000 -2.000000 3.000000
std 9.219544 7.280110 4.123106 2.000000
H one mean 6.000000 3.000000 7.000000 1.000000
std 4.000000 6.000000 0.000000 6.000000
three mean 3.000000 2.000000 5.000000 4.000000
std 1.000000 1.000000 3.000000 0.000000
two mean 2.000000 2.000000 4.000000 2.000000
std 9.000000 7.000000 1.000000 0.000000
V one mean 9.000000 5.000000 0.000000 5.000000
std 7.000000 9.000000 1.000000 1.000000
three mean 3.000000 0.000000 3.000000 4.000000
std 1.000000 4.000000 9.000000 2.000000
two mean 3.000000 6.000000 3.000000 2.000000
std 1.000000 3.000000 1.000000 4.000000
来源:https://stackoverflow.com/questions/49875793/multiindex-selecting-in-pandas