chi-squared | 易学教程

Can we generate contingency table for chisquare test using python?

阅读更多关于 Can we generate contingency table for chisquare test using python?

问题 I am using scipy.stats.chi2_contingency method to get chi square statistics. We need to pass frequency table i.e. contingency table as parameter. But I have a feature vector and want to automatically generate the frequency table. Do we have any such function available? I am doing it like this currently: def contigency_matrix_categorical(data_series,target_series,target_val,indicator_val): observed_freq={} for targets in target_val: observed_freq[targets]={} for indicators in indicator_val:

Can someone tell me why R is not using the whole data.frame for this chisq.test?

阅读更多关于 Can someone tell me why R is not using the whole data.frame for this chisq.test?

问题 I can't come up with a solution to a problem I've had when trying to create my own data.frame and run a quantitative analysis (such as a chisq.test ) on it. The backdrop is as follows: I've summarized data I received relating to two hospitals. Both measured the same categorical variable n number of times. In this case it's how frequently health-care associated bacteria were found during a specific observation period. In a table, the summarized data looks as follows, where % indicates the

Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

阅读更多关于 Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

问题 I'm trying to conduct a chi square test on several groups within the dplyr frame . The problem is, group_by() %>% summarise() doesn't seem to do trick. Simulated data (same structure as problematic data, but random, so p.values should be high) set.seed(1) data.frame(partido=sample(c("PRI", "PAN"), 100, 0.6), genero=sample(c("H", "M"), 100, 0.7), GM=sample(c("Bajo", "Muy bajo"), 100, 0.8)) -> foo I want to compare several groups defined by GM to see if there are changes in the p.values for the

R Chi-squared stratified by multiple groups

阅读更多关于 R Chi-squared stratified by multiple groups

问题 I've the following df with 3 factors variables and one percentage variable: df <- data.frame( group = rep(c("Case", "Control"), each=16), timing = rep(c("T0", "T1", "T2", "T3"), each=4, times=2), food.type = rep (c("Very healthy", "Healthy", "Unhealthy", "Very bad"), times = 8), intake.percentage = runif(32, min=1, max=25) ) How do I perform the test (chi squared) in order to evaluate statistical difference each time (T0-T3) between groups (case; controls) for each kind of food? For your

How do summarize this data table with dplyr, then run a chisq.test (or similar) on the results and loop it all into one neat function?

阅读更多关于 How do summarize this data table with dplyr, then run a chisq.test (or similar) on the results and loop it all into one neat function?

问题 This question was embedded in another question I asked here, but as it goes beyond the scope of what I wanted to know in the initial inquiry, I thought it might deserve a separate thread. I've been trying to come up with a solution for this problem based on the answers I have received here and here using dplyr and the functions written by Khashaa and Jaap. Using the solutions provided to me (especially from Jaap), I have been able to summarize the raw data I received into a matrix-looking

How to check goodness of fit in python from Chisquare function?

阅读更多关于 How to check goodness of fit in python from Chisquare function?

问题 I want to understand how chisquare fnction works. I also read about chisquare test in which we have to check chisquare table to compare values of chi or p value with critical values. In the following code my chisquare value is 3.5. Does it correspond to right tail or left tail? Secondly p value is 0.62. I have read that 0.05 is standard for right tails on the chi square distribution curve. If p value is grater than 0.05 we accept our proposed hypothesi i.e., f_exp. But what about the

R chi squared test (3x2 contingency table) for each row in a table

阅读更多关于 R chi squared test (3x2 contingency table) for each row in a table

问题 I have a dataframe, and want to perform for each row (3x2 contingency table) a chi squared test . row 1 102 4998 105 3264 105 3636 row 2 210 4890 22 3347 20 3721 row 3 ... So for the first row a chi squared test should be performed for the following contingency table; group A 102 4998 group B 105 3264 group C 105 3636 I use the following code, but this does not calculate the correct p-value (all p-values are equal to zero while this is not the case when I calculate the chi-square test myself)

Run chi-squared test on a data.frame

阅读更多关于 Run chi-squared test on a data.frame

问题 I have this data.frame: df <- data.frame(xy = c("x", "y"), V1 = c(3, 0), V2 = c(0, 0), V3 = c(5, 0), V4 = c(5, 2)) df xy V1 V2 V3 V4 1 x 3 0 5 5 2 y 0 0 0 2 I want to know if x or y is more associated with any of V1 , V2 , V3 or V4 . To test this, I can use a chi-squared. This is what I've tried, none of which work: chisq.test(df) chisq.test(as.matrix(df)) chisq.test(as.table(df)) How can I run a chi-squared test on df ? 回答1: use this: df <- as.table(rbind(c(3,0,5,5),c(0,0,0,2))) > df A B C D

R chisq.test() on dataframe using binary comparsion

阅读更多关于 R chisq.test() on dataframe using binary comparsion

问题 I want to do a chisq.test on a dataframe of dimension (50x752). I want to get the pvalues (adjusted by multiple testing) for all possible paire-wise comparison for all columns. At the end I want to get back a matrix (50x50) to generate a heatmap of the adjusted chisq pvalues. Here is what I do at the moment but this is far beeing ideal. Step1: do the pairewise comparison function(data,p.adjust.method="holm") { cor.mat <- cor(data) x<-ncol(data)#nb of column in matrix here 50 y<-nrow(data)#nb

Automate Chi-square across columns

阅读更多关于 Automate Chi-square across columns

问题 I would like to use Chi-square for testing set of data. How to do it, using loop for or sapply. This is a set of sample data: n<-40 set.seed(1) data <- data.frame(v1.1=sample(c('0','1'),n,replace=T),v1.2=sample(c('0','1'),n,replace=T),v1.3=sample(c('0','1'),n,replace=T),v1.4=sample(c('0','1'),n,replace=T),v1.5=sample(c('0','1'),n,replace=T),m1=sample(c('1','2'),n,replace=T)) I would like to test all variables named v1.x with variable m1. That's all. I want to avoid such a situtation: chisq