Deleting rows for which a column value appears fewer than a given number of times conditional on uniqueness

问题

I have data that looks like

date player market 
1-1     1     1
1-1     2     1
1-1     1     2
1-2     2     1
1-2     3     2
1-2    12     3
1-2    11     3
1-2     2     3
1-3    24     3
1-3     4     4
1-3    21     1
1-4     1     1
1-4    51     1
1-4     1     1
1-5     1     2

I want to delete all rows related to the column Market that have appear in fewer than in three unique dates.

So this would delete all rows except for those in which the Market column is 1 or 2. This is a simplified version of what I am doing, so looping is out of the question.

Is there a quick way to do this?

There are 60,000 markets.

回答1:

df is your data.frame to begin with -

library(data.table)
dt <- data.table(df)
dt[,Freq := length(unique(date)), by = market]
dt2 <- dt[Freq > 2]

Output -

> dt2
    date player market Freq
 1:  1-1      1      1    4
 2:  1-1      2      1    4
 3:  1-1      1      2    3
 4:  1-2      2      1    4
 5:  1-2      3      2    3
 6:  1-3     21      1    4
 7:  1-4      1      1    4
 8:  1-4     51      1    4
 9:  1-4      1      1    4
10:  1-5      1      2    3

来源：https://stackoverflow.com/questions/22432769/deleting-rows-for-which-a-column-value-appears-fewer-than-a-given-number-of-time

标签

row

which

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!