Chi Square Analysis using for loop in R

强颜欢笑 提交于 2019-11-26 07:43:06

问题


I\'m trying to do chi square analysis for all combinations of variables in the data and my code is:

Data <- esoph[ , 1:3]
OldStatistic <- NA
for(i in 1:(ncol(Data)-1)){
for(j in (i+1):ncol(Data)){
Statistic <- data.frame(\"Row\"=colnames(Data)[i], \"Column\"=colnames(Data)[j],
                     \"Chi.Square\"=round(chisq.test(Data[ ,i], Data[ ,j])$statistic, 3),
                     \"df\"=chisq.test(Data[ ,i], Data[ ,j])$parameter,
                     \"p.value\"=round(chisq.test(Data[ ,i], Data[ ,j])$p.value, 3),
                      row.names=NULL)
temp <- rbind(OldStatistic, Statistic)
OldStatistic <- Statistic
Statistic <- temp
}
}

str(Data)
\'data.frame\':   88 obs. of  3 variables:
 $ agegp: Ord.factor w/ 6 levels \"25-34\"<\"35-44\"<..: 1 1 1 1 1 1 1 1 1 1 ...
 $ alcgp: Ord.factor w/ 4 levels \"0-39g/day\"<\"40-79\"<..: 1 1 1 1 2 2 2 2 3 3 ...
 $ tobgp: Ord.factor w/ 4 levels \"0-9g/day\"<\"10-19\"<..: 1 2 3 4 1 2 3 4 1 2 ...


Statistic
    Row Column Chi.Square df p.value
1 agegp  tobgp      2.400 15       1
2 alcgp  tobgp      0.619  9       1

My code gives my the chi square analysis output for variable 1 vs variable 3, and variable 2 vs variable 3 and is missing for variable 1 vs variable 2. I tried hard but could not fixed the code. Any comment and suggestion will be highly appreciated. I\'d like like to do cross tabulation for all possible combinations. Thanks in advance.

EDIT

I used to do this kind of analysis in SPSS but now I want to switch to R.


回答1:


A sample of your data would be appreciated, but I think this will work for you. First, create a combination of all columns with combn. Then write a function to use with an apply function to iterate through the combos. I like to use plyr since it is easy to specify what you want for a data structure on the back end. Also note you only need to compute the chi square test once for each combination of columns, which should speed things up quite a bit as well.

library(plyr)

combos <- combn(ncol(Dat),2)

adply(combos, 2, function(x) {
  test <- chisq.test(Dat[, x[1]], Dat[, x[2]])

  out <- data.frame("Row" = colnames(Dat)[x[1]]
                    , "Column" = colnames(Dat[x[2]])
                    , "Chi.Square" = round(test$statistic,3)
                    ,  "df"= test$parameter
                    ,  "p.value" = round(test$p.value, 3)
                    )
  return(out)

})  



回答2:


I wrote my own function. It creates a matrix where all nominal variables are tested against each other. It can also save the results as excel file. It displays all the pvalues that are smaller than 5%.

funMassChi <- function (x,delFirst=0,xlsxpath=FALSE) {
  options(scipen = 999)

  start <- (delFirst+1)
  ds <- x[,start:ncol(x)]

  cATeND <- ncol(ds)
  catID  <- 1:cATeND

  resMat <- ds[1:cATeND,1:(cATeND-1)]
  resMat[,] <- NA

    for(nCc in 1:(length(catID)-1)){
      for(nDc in (nCc+1):length(catID)){
        tryCatch({
          chiRes <- chisq.test(ds[,catID[nCc]],ds[,catID[nDc]])
          resMat[nDc,nCc]<- chiRes[[3]]
        }, error=function(e){cat(paste("ERROR :","at",nCc,nDc, sep=" "),conditionMessage(e), "\n")})
      }
    }
  resMat[resMat > 0.05] <- "" 
  Ergebnis <- cbind(CatNames=names(ds),resMat)
  Ergebnis <<- Ergebnis[-1,] 

  if (!(xlsxpath==FALSE)) {
     write.xlsx(x = Ergebnis, file = paste(xlsxpath,"ALLChi-",Sys.Date(),".xlsx",sep=""),
             sheetName = "Tabelle1", row.names = FALSE)
  }
}

funMassChi(categorialDATA,delFirst=3,xlsxpath="C:/folder1/folder2/")

delFirst can delete the first n columns. So if you have an count index or something you dont want to test.

I hope this can help anyone else.



来源:https://stackoverflow.com/questions/7382039/chi-square-analysis-using-for-loop-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!