Get most frequent string from a data frame column

前端未结

关注

 2  369

I need to return the n most frequent occurrences of a string, using a multiple row data frame as the input. All the values are in the same column called \"MissingDates\"

相关标签:

2条回答

盖世英雄少女心

2021-01-20 03:20

Here's another possible solution.

Set up some data,

set.seed(5)
ss1 <- sample(seq(s <- Sys.Date(), s+10, "day"), 20, TRUE)
ss2 <- sample(seq(s <- Sys.Date(), s+10, "day"), 20, TRUE)
ls1 <- list(ss1 = ss1, ss2 = ss2)

Define the function:

f <- function(x, n) sort(table(x), decreasing = TRUE)[1:n]

Apply the function over the data:

lapply(ls1, f, n = 3)
# $ss1
# x
# 2014-09-08 2014-09-09 2014-09-07 
#          3          3          2 
# 
# $ss2
# x
# 2014-09-10 2014-09-06 2014-09-07 
#          4          3          2

0 讨论(0)

渐次进展

2021-01-20 03:29

It seems like you need something like:

Function

freqfunc <- function(x, n){
  tail(sort(table(unlist(strsplit(as.character(x), ", ")))), n)
}

Testing on your data set

freqfunc(gaps$MissingDates, 5) # Five most frequent dates

## 1996-12-26 1997-12-26 1998-01-02 1999-12-31 2001-09-12 
##          4          4          4          4          4

0 讨论(0)