Chi Square Analysis using for loop in R

前端 未结 2 385
夕颜
夕颜 2020-11-27 19:41

I\'m trying to do chi square analysis for all combinations of variables in the data and my code is:

Data <- esoph[ , 1:3]
OldStatistic <- NA
for(i in 1         


        
相关标签:
2条回答
  • 2020-11-27 20:19

    I wrote my own function. It creates a matrix where all nominal variables are tested against each other. It can also save the results as excel file. It displays all the pvalues that are smaller than 5%.

    funMassChi <- function (x,delFirst=0,xlsxpath=FALSE) {
      options(scipen = 999)
    
      start <- (delFirst+1)
      ds <- x[,start:ncol(x)]
    
      cATeND <- ncol(ds)
      catID  <- 1:cATeND
    
      resMat <- ds[1:cATeND,1:(cATeND-1)]
      resMat[,] <- NA
    
        for(nCc in 1:(length(catID)-1)){
          for(nDc in (nCc+1):length(catID)){
            tryCatch({
              chiRes <- chisq.test(ds[,catID[nCc]],ds[,catID[nDc]])
              resMat[nDc,nCc]<- chiRes[[3]]
            }, error=function(e){cat(paste("ERROR :","at",nCc,nDc, sep=" "),conditionMessage(e), "\n")})
          }
        }
      resMat[resMat > 0.05] <- "" 
      Ergebnis <- cbind(CatNames=names(ds),resMat)
      Ergebnis <<- Ergebnis[-1,] 
    
      if (!(xlsxpath==FALSE)) {
         write.xlsx(x = Ergebnis, file = paste(xlsxpath,"ALLChi-",Sys.Date(),".xlsx",sep=""),
                 sheetName = "Tabelle1", row.names = FALSE)
      }
    }
    
    funMassChi(categorialDATA,delFirst=3,xlsxpath="C:/folder1/folder2/")
    

    delFirst can delete the first n columns. So if you have an count index or something you dont want to test.

    I hope this can help anyone else.

    0 讨论(0)
  • 2020-11-27 20:20

    A sample of your data would be appreciated, but I think this will work for you. First, create a combination of all columns with combn. Then write a function to use with an apply function to iterate through the combos. I like to use plyr since it is easy to specify what you want for a data structure on the back end. Also note you only need to compute the chi square test once for each combination of columns, which should speed things up quite a bit as well.

    library(plyr)
    
    combos <- combn(ncol(Dat),2)
    
    adply(combos, 2, function(x) {
      test <- chisq.test(Dat[, x[1]], Dat[, x[2]])
    
      out <- data.frame("Row" = colnames(Dat)[x[1]]
                        , "Column" = colnames(Dat[x[2]])
                        , "Chi.Square" = round(test$statistic,3)
                        ,  "df"= test$parameter
                        ,  "p.value" = round(test$p.value, 3)
                        )
      return(out)
    
    })  
    
    0 讨论(0)
提交回复
热议问题