问题
I'm trying to create a bootstrapped sample from a multiindex dataframe in Pandas. Below is some code to generate the kind of data I need.
from itertools import product
import pandas as pd
import numpy as np
df = pd.DataFrame({'group1': [1, 1, 1, 2, 2, 3],
'group2': [13, 18, 20, 77, 109, 123],
'value1': [1.1, 2, 3, 4, 5, 6],
'value2': [7.1, 8, 9, 10, 11, 12]
})
df = df.set_index(['group1', 'group2'])
print df
The df dataframe looks like:
value1 value2
group1 group2
1 13 1.1 7.1
18 2.0 8.0
20 3.0 9.0
2 77 4.0 10.0
109 5.0 11.0
3 123 6.0 12.0
I want to get a random sample from the first index. For example let's say the random values np.random.randint(3,size=3)
produces [3,2,2]. I'd like the resultant dataframe to look like:
value1 value2
group1 group2
3 123 6.0 12.0
2 77 4.0 10.0
109 5.0 11.0
2 77 4.0 10.0
109 5.0 11.0
I've spent a lot of time researching this and I've been unable to find a similar example where the multiindex values are integers, the secondary index is of variable length, and the primary index samples are repeating. This is how I think an appropriate implementation for bootstrapping would work.
回答1:
Try:
df.unstack().sample(3, replace=True).stack()
来源:https://stackoverflow.com/questions/38731858/how-to-get-a-random-bootstrap-sample-from-pandas-multiindex