问题
I have data that looks like
date player market
1-1 1 1
1-1 2 1
1-1 1 2
1-2 2 1
1-2 3 2
1-2 12 3
1-2 11 3
1-2 2 3
1-3 24 3
1-3 4 4
1-3 21 1
1-4 1 1
1-4 51 1
1-4 1 1
1-5 1 2
I want to delete all rows related to the column Market that have appear in fewer than in three unique dates.
So this would delete all rows except for those in which the Market column is 1 or 2. This is a simplified version of what I am doing, so looping is out of the question.
Is there a quick way to do this?
There are 60,000 markets.
回答1:
df
is your data.frame to begin with -
library(data.table)
dt <- data.table(df)
dt[,Freq := length(unique(date)), by = market]
dt2 <- dt[Freq > 2]
Output -
> dt2
date player market Freq
1: 1-1 1 1 4
2: 1-1 2 1 4
3: 1-1 1 2 3
4: 1-2 2 1 4
5: 1-2 3 2 3
6: 1-3 21 1 4
7: 1-4 1 1 4
8: 1-4 51 1 4
9: 1-4 1 1 4
10: 1-5 1 2 3
来源:https://stackoverflow.com/questions/22432769/deleting-rows-for-which-a-column-value-appears-fewer-than-a-given-number-of-time