How to count TRUE values in a logical vector

后端 未结 8 1544
[愿得一人]
[愿得一人] 2020-11-28 01:17

In R, what is the most efficient/idiomatic way to count the number of TRUE values in a logical vector? I can think of two ways:

z <- sample(c         


        
相关标签:
8条回答
  • 2020-11-28 01:55

    There's also a package called bit that is specifically designed for fast boolean operations. It's especially useful if you have large vectors or need to do many boolean operations.

    z <- sample(c(TRUE, FALSE), 1e8, rep = TRUE)
    
    system.time({
      sum(z) # 0.170s
    })
    
    system.time({
      bit::sum.bit(z) # 0.021s, ~10x improvement in speed
    })
    
    0 讨论(0)
  • 2020-11-28 02:02

    I've just had a particular problem where I had to count the number of true statements from a logical vector and this worked best for me...

    length(grep(TRUE, (gene.rep.matrix[i,1:6] > 1))) > 5
    

    So This takes a subset of the gene.rep.matrix object, and applies a logical test, returning a logical vector. This vector is put as an argument to grep, which returns the locations of any TRUE entries. Length then calculates how many entries grep finds, thus giving the number of TRUE entries.

    0 讨论(0)
  • 2020-11-28 02:06

    There are some problems when logical vector contains NA values.
    See for example:

    z <- c(TRUE, FALSE, NA)
    sum(z) # gives you NA
    table(z)["TRUE"] # gives you 1
    length(z[z == TRUE]) # f3lix answer, gives you 2 (because NA indexing returns values)
    

    So I think the safest is to use na.rm = TRUE:

    sum(z, na.rm = TRUE) # best way to count TRUE values
    

    (which gives 1). I think that table solution is less efficient (look at the code of table function).

    Also, you should be careful with the "table" solution, in case there are no TRUE values in the logical vector. Suppose z <- c(NA, FALSE, NA) or simply z <- c(FALSE, FALSE), then table(z)["TRUE"] gives you NA for both cases.

    0 讨论(0)
  • 2020-11-28 02:08

    Another way is

    > length(z[z==TRUE])
    [1] 498
    

    While sum(z) is nice and short, for me length(z[z==TRUE]) is more self explaining. Though, I think with a simple task like this it does not really make a difference...

    If it is a large vector, you probably should go with the fastest solution, which is sum(z). length(z[z==TRUE]) is about 10x slower and table(z)[TRUE] is about 200x slower than sum(z).

    Summing up, sum(z) is the fastest to type and to execute.

    0 讨论(0)
  • I've been doing something similar a few weeks ago. Here's a possible solution, it's written from scratch, so it's kind of beta-release or something like that. I'll try to improve it by removing loops from code...

    The main idea is to write a function that will take 2 (or 3) arguments. First one is a data.frame which holds the data gathered from questionnaire, and the second one is a numeric vector with correct answers (this is only applicable for single choice questionnaire). Alternatively, you can add third argument that will return numeric vector with final score, or data.frame with embedded score.

    fscore <- function(x, sol, output = 'numeric') {
        if (ncol(x) != length(sol)) {
            stop('Number of items differs from length of correct answers!')
        } else {
            inc <- matrix(ncol=ncol(x), nrow=nrow(x))
            for (i in 1:ncol(x)) {
                inc[,i] <- x[,i] == sol[i]
            }
            if (output == 'numeric') {
                res <- rowSums(inc)
            } else if (output == 'data.frame') {
                res <- data.frame(x, result = rowSums(inc))
            } else {
                stop('Type not supported!')
            }
        }
        return(res)
    }
    

    I'll try to do this in a more elegant manner with some *ply function. Notice that I didn't put na.rm argument... Will do that

    # create dummy data frame - values from 1 to 5
    set.seed(100)
    d <- as.data.frame(matrix(round(runif(200,1,5)), 10))
    # create solution vector
    sol <- round(runif(20, 1, 5))
    

    Now apply a function:

    > fscore(d, sol)
     [1] 6 4 2 4 4 3 3 6 2 6
    

    If you pass data.frame argument, it will return modified data.frame. I'll try to fix this one... Hope it helps!

    0 讨论(0)
  • 2020-11-28 02:10

    Another option is to use summary function. It gives a summary of the Ts, Fs and NAs.

    > summary(hival)
       Mode   FALSE    TRUE    NA's 
    logical    4367      53    2076 
    > 
    
    0 讨论(0)
提交回复
热议问题