I have a DataFrame of answers for 100 questions_id and 50 user_id\'s. Each row represents a single question from a specific user. The ta
DataFrame
questions_id
user_id
Use groupby and filter, very succinct and intended for this purpose.
df1 = df.groupby('user_id').filter(lambda x: len(x) > 100)
For better performance, use np.unique and map:
m = dict(zip(*np.unique(df.user_id, return_counts=True))) df[df['user_id'].map(m) > 100]