chi-squared

Can we generate contingency table for chisquare test using python?

最后都变了- 提交于 2019-12-12 11:51:48
问题 I am using scipy.stats.chi2_contingency method to get chi square statistics. We need to pass frequency table i.e. contingency table as parameter. But I have a feature vector and want to automatically generate the frequency table. Do we have any such function available? I am doing it like this currently: def contigency_matrix_categorical(data_series,target_series,target_val,indicator_val): observed_freq={} for targets in target_val: observed_freq[targets]={} for indicators in indicator_val:

Can someone tell me why R is not using the whole data.frame for this chisq.test?

无人久伴 提交于 2019-12-12 11:07:42
问题 I can't come up with a solution to a problem I've had when trying to create my own data.frame and run a quantitative analysis (such as a chisq.test ) on it. The backdrop is as follows: I've summarized data I received relating to two hospitals. Both measured the same categorical variable n number of times. In this case it's how frequently health-care associated bacteria were found during a specific observation period. In a table, the summarized data looks as follows, where % indicates the

Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

半世苍凉 提交于 2019-12-12 07:23:01
问题 I'm trying to conduct a chi square test on several groups within the dplyr frame . The problem is, group_by() %>% summarise() doesn't seem to do trick. Simulated data (same structure as problematic data, but random, so p.values should be high) set.seed(1) data.frame(partido=sample(c("PRI", "PAN"), 100, 0.6), genero=sample(c("H", "M"), 100, 0.7), GM=sample(c("Bajo", "Muy bajo"), 100, 0.8)) -> foo I want to compare several groups defined by GM to see if there are changes in the p.values for the

R Chi-squared stratified by multiple groups

霸气de小男生 提交于 2019-12-11 23:57:40
问题 I've the following df with 3 factors variables and one percentage variable: df <- data.frame( group = rep(c("Case", "Control"), each=16), timing = rep(c("T0", "T1", "T2", "T3"), each=4, times=2), food.type = rep (c("Very healthy", "Healthy", "Unhealthy", "Very bad"), times = 8), intake.percentage = runif(32, min=1, max=25) ) How do I perform the test (chi squared) in order to evaluate statistical difference each time (T0-T3) between groups (case; controls) for each kind of food? For your

How do summarize this data table with dplyr, then run a chisq.test (or similar) on the results and loop it all into one neat function?

久未见 提交于 2019-12-11 17:04:11
问题 This question was embedded in another question I asked here, but as it goes beyond the scope of what I wanted to know in the initial inquiry, I thought it might deserve a separate thread. I've been trying to come up with a solution for this problem based on the answers I have received here and here using dplyr and the functions written by Khashaa and Jaap. Using the solutions provided to me (especially from Jaap), I have been able to summarize the raw data I received into a matrix-looking

How to check goodness of fit in python from Chisquare function?

≯℡__Kan透↙ 提交于 2019-12-11 15:13:30
问题 I want to understand how chisquare fnction works. I also read about chisquare test in which we have to check chisquare table to compare values of chi or p value with critical values. In the following code my chisquare value is 3.5. Does it correspond to right tail or left tail? Secondly p value is 0.62. I have read that 0.05 is standard for right tails on the chi square distribution curve. If p value is grater than 0.05 we accept our proposed hypothesi i.e., f_exp. But what about the

R chi squared test (3x2 contingency table) for each row in a table

心不动则不痛 提交于 2019-12-11 10:27:12
问题 I have a dataframe, and want to perform for each row (3x2 contingency table) a chi squared test . row 1 102 4998 105 3264 105 3636 row 2 210 4890 22 3347 20 3721 row 3 ... So for the first row a chi squared test should be performed for the following contingency table; group A 102 4998 group B 105 3264 group C 105 3636 I use the following code, but this does not calculate the correct p-value (all p-values are equal to zero while this is not the case when I calculate the chi-square test myself)

Run chi-squared test on a data.frame

匆匆过客 提交于 2019-12-11 10:03:57
问题 I have this data.frame: df <- data.frame(xy = c("x", "y"), V1 = c(3, 0), V2 = c(0, 0), V3 = c(5, 0), V4 = c(5, 2)) df xy V1 V2 V3 V4 1 x 3 0 5 5 2 y 0 0 0 2 I want to know if x or y is more associated with any of V1 , V2 , V3 or V4 . To test this, I can use a chi-squared. This is what I've tried, none of which work: chisq.test(df) chisq.test(as.matrix(df)) chisq.test(as.table(df)) How can I run a chi-squared test on df ? 回答1: use this: df <- as.table(rbind(c(3,0,5,5),c(0,0,0,2))) > df A B C D

R chisq.test() on dataframe using binary comparsion

自古美人都是妖i 提交于 2019-12-11 09:37:26
问题 I want to do a chisq.test on a dataframe of dimension (50x752). I want to get the pvalues (adjusted by multiple testing) for all possible paire-wise comparison for all columns. At the end I want to get back a matrix (50x50) to generate a heatmap of the adjusted chisq pvalues. Here is what I do at the moment but this is far beeing ideal. Step1: do the pairewise comparison function(data,p.adjust.method="holm") { cor.mat <- cor(data) x<-ncol(data)#nb of column in matrix here 50 y<-nrow(data)#nb

Automate Chi-square across columns

给你一囗甜甜゛ 提交于 2019-12-11 05:41:28
问题 I would like to use Chi-square for testing set of data. How to do it, using loop for or sapply. This is a set of sample data: n<-40 set.seed(1) data <- data.frame(v1.1=sample(c('0','1'),n,replace=T),v1.2=sample(c('0','1'),n,replace=T),v1.3=sample(c('0','1'),n,replace=T),v1.4=sample(c('0','1'),n,replace=T),v1.5=sample(c('0','1'),n,replace=T),m1=sample(c('1','2'),n,replace=T)) I would like to test all variables named v1.x with variable m1. That's all. I want to avoid such a situtation: chisq