how to realize countifs function (excel) in R

前端 未结 5 926
面向向阳花
面向向阳花 2020-11-30 08:58

I have a dataset containing 100000 rows of data. I tried to do some countif operations in Excel, but it was prohibitively slow. So I am wondering if this kind o

相关标签:
5条回答
  • 2020-11-30 09:38

    Here an example with 100000 rows (occupations are set here from A to Z):

    > a = data.frame(sex=sample(c("M", "F"), 100000, replace=T), occupation=sample(LETTERS, 100000, replace=T))
    > sum(a$sex == "M" & a$occupation=="A")
    [1] 1882
    

    returns the number of males with occupation "A".

    EDIT

    As I understand from your comment, you want the counts of all possible combinations of sex and occupation. So first create a dataframe with all combinations:

    combns = expand.grid(c("M", "F"), LETTERS)
    

    and loop with apply to sum for your criteria and append the results to combns:

    combns = cbind (combns, apply(combns, 1, function(x)sum(a$sex==x[1] & a$occupation==x[2])))
    colnames(combns) = c("sex", "occupation", "count")
    

    The first rows of your result look as follows:

      sex occupation count
    1   M          A  1882
    2   F          A  1869
    3   M          B  1866
    4   F          B  1904
    5   M          C  1979
    6   F          C  1910
    

    Does this solve your problem?

    OR:

    Much easier solution suggested by thelatemai:

    table(a$sex, a$occupation)
    
    
           A    B    C    D    E    F    G    H    I    J    K    L    M    N    O
      F 1869 1904 1910 1907 1894 1940 1964 1907 1918 1892 1962 1933 1886 1960 1972
      M 1882 1866 1979 1904 1895 1845 1946 1905 1999 1994 1933 1950 1876 1856 1911
    
           P    Q    R    S    T    U    V    W    X    Y    Z
      F 1908 1907 1883 1888 1943 1922 2016 1962 1885 1898 1889
      M 1928 1938 1916 1927 1972 1965 1946 1903 1965 1974 1906
    
    0 讨论(0)
  • 2020-11-30 09:46

    Given a dataset

    df <- data.frame( sex = c('M', 'M', 'F', 'F', 'M'), 
                      occupation = c('analyst', 'dentist', 'dentist', 'analyst', 'cook') )
    

    you can subset rows

    df[df$sex == 'M',] # To get all males
    df[df$occupation == 'analyst',] # All analysts
    

    etc.

    If you want to get number of rows, just call the function nrow such as

    nrow(df[df$sex == 'M',])
    
    0 讨论(0)
  • 2020-11-30 09:56

    Easy peasy. Your data frame will look like this:

    df <- data.frame(sex=c('M','F','M'),
                     occupation=c('Student','Analyst','Analyst'))
    

    You can then do the equivalent of a COUNTIF by first specifying the IF part, like so:

    df$sex == 'M'
    

    This will give you a boolean vector, i.e. a vector of TRUE and FALSE. What you want is to count the observations for which the condition is TRUE. Since in R TRUE and FALSE double as 1 and 0 you can simply sum() over the boolean vector. The equivalent of COUNTIF(sex='M') is therefore

    sum(df$sex == 'M')
    

    Should there be rows in which the sex is not specified the above will give back NA. In that case, if you just want to ignore the missing observations use

    sum(df$sex == 'M', na.rm=TRUE)
    
    0 讨论(0)
  • 2020-11-30 09:58
    library(matrixStats)
    > data <- rbind(c("M", "F", "M"), c("Student", "Analyst", "Analyst"))
    > rowCounts(data, value = 'M') # output = 2 0
    > rowCounts(data, value = 'F') # output = 1 0
    
    0 讨论(0)
  • 2020-11-30 10:04

    Table is the obvious choice, but it returns an object of class table which takes a few annoying steps to transform back into a data.frame So, if you're OK using dplyr, you use the command tally:

        library(dplyr)
        df = data.frame(sex=sample(c("M", "F"), 100000, replace=T), occupation=sample(c('Analyst', 'Student'), 100000, replace=T)
        df %>% group_by_all() %>% tally()
    
    
    # A tibble: 4 x 3
    # Groups:   sex [2]
      sex   occupation `n()`
      <fct> <fct>      <int>
    1 F     Analyst    25105
    2 F     Student    24933
    3 M     Analyst    24769
    4 M     Student    25193
    
    0 讨论(0)
提交回复
热议问题