How do I get a contingency table?

后端 未结 6 1514
感动是毒
感动是毒 2020-11-21 23:27

I am trying to create a contingency table from a particular type of data. This would be doable with loops etc... but because my final table would contain more than 10E5 cell

相关标签:
6条回答
  • 2020-11-22 00:00

    the reshape package should do the trick.

    > library(reshape)
    
    > df <- data.frame(PLANT = c("Tragopogon_pratensis","Anthriscus_sylvestris","Anthriscus_sylvestris","Heracleum_sphondylium","Anthriscus_sylvestris","Anthriscus_sylvestris","Cerastium_holosteoides"),
                       ANIMAL= c("Propylea_quatuordecimpunctata","Rhagonycha_nigriventris","Sarcophaga_carnaria","Sarcophaga_carnaria","Sarcophaga_variegata","Sphaerophoria_interrupta_Gruppe","Sphaerophoria_interrupta_Gruppe"),
                       INTERACTIONS = c(1,3,2,1,4,3,1),
                       stringsAsFactors=FALSE)
    
    > df <- melt(df,id.vars=c("PLANT","ANIMAL"))    
    > df <- cast(df,formula=PLANT~ANIMAL)
    > df <- replace(df,is.na(df),0)
    
    > df
                       PLANT Propylea_quatuordecimpunctata Rhagonycha_nigriventris
    1  Anthriscus_sylvestris                             0                       3
    2 Cerastium_holosteoides                             0                       0
    3  Heracleum_sphondylium                             0                       0
    4   Tragopogon_pratensis                             1                       0
      Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe
    1                   2                    4                               3
    2                   0                    0                               1
    3                   1                    0                               0
    4                   0                    0                               0
    

    I'm still figuring out how to fix the order issue, any suggestion?

    0 讨论(0)
  • 2020-11-22 00:00

    Simply use dcast() function of "reshape2" package:

    ans = dcast( df, PLANT~ ANIMAL,value.var = "INTERACTIONS", fill = 0 ) 
    

    Here "PLANT" will be on the left column, "ANIMALS" on the top row, filling of the table will happen using "INTERACTIONS" and "NULL" values will be filled using 0's.

    0 讨论(0)
  • 2020-11-22 00:15

    In base R, use table or xtabs:

    with(warpbreaks, table(wool, tension))
    
        tension
    wool L M H
       A 9 9 9
       B 9 9 9
    
    xtabs(~wool+tension, data=warpbreaks)
    
        tension
    wool L M H
       A 9 9 9
       B 9 9 9
    

    The gmodels packages has a function CrossTable that gives output similar to what users of SPSS or SAS expects:

    library(gmodels)
    with(warpbreaks, CrossTable(wool, tension))
    
    
       Cell Contents
    |-------------------------|
    |                       N |
    | Chi-square contribution |
    |           N / Row Total |
    |           N / Col Total |
    |         N / Table Total |
    |-------------------------|
    
    
    Total Observations in Table:  54 
    
    
                 | tension 
            wool |         L |         M |         H | Row Total | 
    -------------|-----------|-----------|-----------|-----------|
               A |         9 |         9 |         9 |        27 | 
                 |     0.000 |     0.000 |     0.000 |           | 
                 |     0.333 |     0.333 |     0.333 |     0.500 | 
                 |     0.500 |     0.500 |     0.500 |           | 
                 |     0.167 |     0.167 |     0.167 |           | 
    -------------|-----------|-----------|-----------|-----------|
               B |         9 |         9 |         9 |        27 | 
                 |     0.000 |     0.000 |     0.000 |           | 
                 |     0.333 |     0.333 |     0.333 |     0.500 | 
                 |     0.500 |     0.500 |     0.500 |           | 
                 |     0.167 |     0.167 |     0.167 |           | 
    -------------|-----------|-----------|-----------|-----------|
    Column Total |        18 |        18 |        18 |        54 | 
                 |     0.333 |     0.333 |     0.333 |           | 
    -------------|-----------|-----------|-----------|-----------|
    
    0 讨论(0)
  • 2020-11-22 00:15

    xtabs in base R should work, for example:

    dat <- data.frame(PLANT = c("p1", "p2", "p2", "p4", "p5", "p5", "p6"),
                      ANIMAL = c("a1", "a2", "a3", "a3", "a4", "a5", "a5"),
                      INTERACTIONS = c(1,3,2,1,4,3,1),
                      stringsAsFactors = FALSE)
    
    (x2.table <- xtabs(dat$INTERACTIONS ~ dat$PLANT + dat$ANIMAL))
    
         dat$ANIMAL
    dat$PLANT a1 a2 a3 a4 a5
           p1  1  0  0  0  0
           p2  0  3  2  0  0
           p4  0  0  1  0  0
           p5  0  0  0  4  3
           p6  0  0  0  0  1
    
    chisq.test(x2.table, simulate.p.value = TRUE)
    

    I think that should do what you're looking for fairly easily. I'm not sure how it scales up in terms of efficiency to a 10E5 contingency table, but that might be a separate issue statistically.

    0 讨论(0)
  • 2020-11-22 00:19

    With dplyr / tidyr:

    df <- read.table(text='PLANT                  ANIMAL                          INTERACTIONS
                     Tragopogon_pratensis   Propylea_quatuordecimpunctata         1
                     Anthriscus_sylvestris  Rhagonycha_nigriventris               3
                     Anthriscus_sylvestris  Sarcophaga_carnaria                   2
                     Heracleum_sphondylium  Sarcophaga_carnaria                   1
                     Anthriscus_sylvestris  Sarcophaga_variegata                  4
                     Anthriscus_sylvestris  Sphaerophoria_interrupta_Gruppe       3
                     Cerastium_holosteoides Sphaerophoria_interrupta_Gruppe       1', header=TRUE)
    library(dplyr)
    library(tidyr)
    df %>% spread(ANIMAL, INTERACTIONS, fill=0)
    
    #                    PLANT Propylea_quatuordecimpunctata Rhagonycha_nigriventris Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe
    # 1  Anthriscus_sylvestris                             0                       3                   2                    4                               3
    # 2 Cerastium_holosteoides                             0                       0                   0                    0                               1
    # 3  Heracleum_sphondylium                             0                       0                   1                    0                               0
    # 4   Tragopogon_pratensis                             1                       0                   0                    0                               0
    
    0 讨论(0)
  • 2020-11-22 00:22

    I'd like to point out that we can get the same results Andrie posted without using the function with:

    R Base Package

    # 3 options
    table(warpbreaks[, 2:3])
    table(warpbreaks[, c("wool", "tension")])
    table(warpbreaks$wool, warpbreaks$tension, dnn = c("wool", "tension"))
    
        tension
    wool L M H
       A 9 9 9
       B 9 9 9
    

    Package gmodels:

    library(gmodels)
    # 2 options    
    CrossTable(warpbreaks$wool, warpbreaks$tension)
    CrossTable(warpbreaks$wool, warpbreaks$tension, dnn = c("Wool", "Tension"))
    
    
       Cell Contents
    |-------------------------|
    |                       N |
    | Chi-square contribution |
    |           N / Row Total |
    |           N / Col Total |
    |         N / Table Total |
    |-------------------------|
    
    
    Total Observations in Table:  54 
    
    
                    | warpbreaks$tension 
    warpbreaks$wool |         L |         M |         H | Row Total | 
    ----------------|-----------|-----------|-----------|-----------|
                  A |         9 |         9 |         9 |        27 | 
                    |     0.000 |     0.000 |     0.000 |           | 
                    |     0.333 |     0.333 |     0.333 |     0.500 | 
                    |     0.500 |     0.500 |     0.500 |           | 
                    |     0.167 |     0.167 |     0.167 |           | 
    ----------------|-----------|-----------|-----------|-----------|
                  B |         9 |         9 |         9 |        27 | 
                    |     0.000 |     0.000 |     0.000 |           | 
                    |     0.333 |     0.333 |     0.333 |     0.500 | 
                    |     0.500 |     0.500 |     0.500 |           | 
                    |     0.167 |     0.167 |     0.167 |           | 
    ----------------|-----------|-----------|-----------|-----------|
       Column Total |        18 |        18 |        18 |        54 | 
                    |     0.333 |     0.333 |     0.333 |           | 
    ----------------|-----------|-----------|-----------|-----------|
    
    0 讨论(0)
提交回复
热议问题