问题
I have a DataFrame which look like:
index name city
0 Yam Hadera
1 Meow Hadera
2 Don Hadera
3 Jazz Hadera
4 Bond Tel Aviv
5 James Tel Aviv
I want Pandas to randomly choose values, using the number of appearances in the city
column (kind of using: df.city.value_counts()
), so the results of my magic function, suppose:
df.magic_sample(3, weight_column='city')
might look like:
0 Yam Hadera
1 Meow Hadera
2 Bond Tel Aviv
Thanks! :)
回答1:
You can group by city
and then sample each group based on their length compared to the length of the original data frame:
df.groupby('city', group_keys=False).apply(lambda g: g.sample(3 * len(g)/len(df)))
回答2:
If I understand the question correctly, maybe you are looking for random.sample
:
>>> import pandas as pd
>>> from random import sample
>>> df = pd.DataFrame(data=[('Yam', 'Hadera'), ('Meow', 'Hadera'), ('Don', 'Hadera'), ('Jazz', 'Hadera'), ('Bond', 'Tel Aviv'), ('James', 'Tel Aviv')], columns=('name', 'city'))
>>> df
name city
0 Yam Hadera
1 Meow Hadera
2 Don Hadera
3 Jazz Hadera
4 Bond Tel Aviv
5 James Tel Aviv
>>> df.iloc[sample(range(len(df)), 3), :]
name city
4 Bond Tel Aviv
0 Yam Hadera
1 Meow Hadera
来源:https://stackoverflow.com/questions/41528513/using-pandas-to-sample-dataframe-using-a-specific-columns-weight