Counting the number of elements with the values of x in a vector

后端 未结 19 1298
闹比i
闹比i 2020-11-22 02:44

I have a vector of numbers:

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
         453,435,324,34,456,56,567,65,34,435)

How can I hav

相关标签:
19条回答
  • 2020-11-22 03:26

    There are different ways of counting a specific elements

    library(plyr)
    numbers =c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,7,65,34,435)
    
    print(length(which(numbers==435)))
    
    #Sum counts number of TRUE's in a vector 
    print(sum(numbers==435))
    print(sum(c(TRUE, FALSE, TRUE)))
    
    #count is present in plyr library 
    #o/p of count is a DataFrame, freq is 1 of the columns of data frame
    print(count(numbers[numbers==435]))
    print(count(numbers[numbers==435])[['freq']])
    
    0 讨论(0)
  • 2020-11-22 03:26

    This is a very fast solution for one-dimensional atomic vectors. It relies on match(), so it is compatible with NA:

    x <- c("a", NA, "a", "c", "a", "b", NA, "c")
    
    fn <- function(x) {
      u <- unique.default(x)
      out <- list(x = u, freq = .Internal(tabulate(match(x, u), length(u))))
      class(out) <- "data.frame"
      attr(out, "row.names") <- seq_along(u)
      out
    }
    
    fn(x)
    
    #>      x freq
    #> 1    a    3
    #> 2 <NA>    2
    #> 3    c    2
    #> 4    b    1
    

    You could also tweak the algorithm so that it doesn't run unique().

    fn2 <- function(x) {
      y <- match(x, x)
      out <- list(x = x, freq = .Internal(tabulate(y, length(x)))[y])
      class(out) <- "data.frame"
      attr(out, "row.names") <- seq_along(x)
      out
    }
    
    fn2(x)
    
    #>      x freq
    #> 1    a    3
    #> 2 <NA>    2
    #> 3    a    3
    #> 4    c    2
    #> 5    a    3
    #> 6    b    1
    #> 7 <NA>    2
    #> 8    c    2
    

    In cases where that output is desirable, you probably don't even need it to re-return the original vector, and the second column is probably all you need. You can get that in one line with the pipe:

    match(x, x) %>% `[`(tabulate(.), .)
    
    #> [1] 3 2 3 2 3 1 2 2
    
    0 讨论(0)
  • 2020-11-22 03:27

    A method that is relatively fast on long vectors and gives a convenient output is to use lengths(split(numbers, numbers)) (note the S at the end of lengths):

    # Make some integer vectors of different sizes
    set.seed(123)
    x <- sample.int(1e3, 1e4, replace = TRUE)
    xl <- sample.int(1e3, 1e6, replace = TRUE)
    xxl <-sample.int(1e3, 1e7, replace = TRUE)
    
    # Number of times each value appears in x:
    a <- lengths(split(x,x))
    
    # Number of times the value 64 appears:
    a["64"]
    #~ 64
    #~ 15
    
    # Occurences of the first 10 values
    a[1:10]
    #~ 1  2  3  4  5  6  7  8  9 10 
    #~ 13 12  6 14 12  5 13 14 11 14 
    

    The output is simply a named vector.
    The speed appears comparable to rle proposed by JBecker and even a bit faster on very long vectors. Here is a microbenchmark in R 3.6.2 with some of the functions proposed:

    library(microbenchmark)
    
    f1 <- function(vec) lengths(split(vec,vec))
    f2 <- function(vec) table(vec)
    f3 <- function(vec) rle(sort(vec))
    f4 <- function(vec) plyr::count(vec)
    
    microbenchmark(split = f1(x),
                   table = f2(x),
                   rle = f3(x),
                   plyr = f4(x))
    #~ Unit: microseconds
    #~   expr      min        lq      mean    median        uq      max neval  cld
    #~  split  402.024  423.2445  492.3400  446.7695  484.3560 2970.107   100  b  
    #~  table 1234.888 1290.0150 1378.8902 1333.2445 1382.2005 3203.332   100    d
    #~    rle  227.685  238.3845  264.2269  245.7935  279.5435  378.514   100 a   
    #~   plyr  758.866  793.0020  866.9325  843.2290  894.5620 2346.407   100   c 
    
    microbenchmark(split = f1(xl),
                   table = f2(xl),
                   rle = f3(xl),
                   plyr = f4(xl))
    #~ Unit: milliseconds
    #~   expr       min        lq      mean    median        uq       max neval cld
    #~  split  21.96075  22.42355  26.39247  23.24847  24.60674  82.88853   100 ab 
    #~  table 100.30543 104.05397 111.62963 105.54308 110.28732 168.27695   100   c
    #~    rle  19.07365  20.64686  23.71367  21.30467  23.22815  78.67523   100 a  
    #~   plyr  24.33968  25.21049  29.71205  26.50363  27.75960  92.02273   100  b 
    
    microbenchmark(split = f1(xxl),
                   table = f2(xxl),
                   rle = f3(xxl),
                   plyr = f4(xxl))
    #~ Unit: milliseconds
    #~   expr       min        lq      mean    median        uq       max neval  cld
    #~  split  296.4496  310.9702  342.6766  332.5098  374.6485  421.1348   100 a   
    #~  table 1151.4551 1239.9688 1283.8998 1288.0994 1323.1833 1385.3040   100    d
    #~    rle  399.9442  430.8396  464.2605  471.4376  483.2439  555.9278   100   c 
    #~   plyr  350.0607  373.1603  414.3596  425.1436  437.8395  506.0169   100  b  
    

    Importantly, the only function that also counts the number of missing values NA is plyr::count. These can also be obtained separately using sum(is.na(vec))

    0 讨论(0)
  • 2020-11-22 03:29

    here's one fast and dirty way:

    x <- 23
    length(subset(numbers, numbers==x))
    
    0 讨论(0)
  • 2020-11-22 03:31

    My preferred solution uses rle, which will return a value (the label, x in your example) and a length, which represents how many times that value appeared in sequence.

    By combining rle with sort, you have an extremely fast way to count the number of times any value appeared. This can be helpful with more complex problems.

    Example:

    > numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
    > a <- rle(sort(numbers))
    > a
      Run Length Encoding
        lengths: int [1:15] 2 1 2 2 1 1 2 1 2 1 ...
        values : num [1:15] 4 5 23 34 43 54 56 65 67 324 ...
    

    If the value you want doesn't show up, or you need to store that value for later, make a a data.frame.

    > b <- data.frame(number=a$values, n=a$lengths)
    > b
        values n
     1       4 2
     2       5 1
     3      23 2
     4      34 2
     5      43 1
     6      54 1
     7      56 2
     8      65 1
     9      67 2
     10    324 1
     11    435 3
     12    453 1
     13    456 1
     14    567 1
     15    657 1
    

    I find it is rare that I want to know the frequency of one value and not all of the values, and rle seems to be the quickest way to get count and store them all.

    0 讨论(0)
  • 2020-11-22 03:31

    Using table but without comparing with names:

    numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435)
    x <- 67
    numbertable <- table(numbers)
    numbertable[as.character(x)]
    #67 
    # 2 
    

    table is useful when you are using the counts of different elements several times. If you need only one count, use sum(numbers == x)

    0 讨论(0)
提交回复
热议问题