chi-squared | 易学教程

Feature selection using scikit-learn

阅读更多关于 Feature selection using scikit-learn

来源： https://stackoverflow.com/questions/59667340/valueerror-input-x-must-be-non-negative-in-python

SparkException: Chi-square test expect factors

阅读更多关于 SparkException: Chi-square test expect factors

问题 I have a dataset containing 42 features and 1 label. I want to apply the selection method chi square selector of the library spark ML before executing Decision tree for the detection of anomaly but I meet this error during the applciation of chi square selector: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 17.0 failed 1 times, most recent failure: Lost task 0.0 in stage 17.0 (TID 45, localhost, executor driver): org.apache.spark.SparkException: Chi-square

Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

阅读更多关于 Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

问题 Actually there are 2 questions, one is more advanced than the other. Q1: I am looking for a method that similar to corrplot() but can deal with factors. I originally tried to use chisq.test() then calculate the p-value and Cramer's V as correlation, but there too many columns to figure out. So could anyone tell me if there is a quick way to create a "corrplot" that each cell contains the value of Cramer's V , while the colour is rendered by p-value . Or any other kind of similar plot.

Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

阅读更多关于 Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

Feature selection using scikit-learn

阅读更多关于 Feature selection using scikit-learn

问题 I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method: SelectKBest(chi2, k=10).fit_transform(A1, A2) Since my dataset consist of negative values, I get the following error: ValueError Traceback (most recent call last) /media/5804B87404B856AA/TFM_UC3M/test2_v.py in <module>() ----> 1 2 3 4 5 /usr/local/lib/python2.6/dist-packages/sklearn/base.pyc in fit_transform(self, X, y, **fit

how to run chisq.test in loops using apply

阅读更多关于 how to run chisq.test in loops using apply

问题 I am a newbie of R. Due to the need of my project, I need to do Chisq test for hundred thousand entries. I learned by myself for a few days and write some code for runing chisq.test in loops. codes: the.data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1) p=c() ID=c() for (i in 1:nrow(the.data)) { data.row = the.data [i,] data.matrix = matrix ( c(data.row$cohort_1_AA, data.row$cohort_1_AB, data.row$cohort_1_BB, data.row$cohort_2_AA, data.row$cohort_2_AB, data.row

Error using dynamic variable specification in R survey function svychisq()

阅读更多关于 Error using dynamic variable specification in R survey function svychisq()

问题 I am using the functions in the R survey -library, and per this example on Stackoverflow, I use bquote() and as.name() to dynamically construct the formula for specifying the variables. This works fine for svytable() , but not for svychisq() . For example: library(survey) data(api) dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) colvar <- 'sch.wide' rowvar <- 'awards' svytable(bquote(~.(as.name(rowvar)) + .(as.name(colvar)) ), dstrat) sch.wide awards No Yes No

Call R from JAVA to get Chi-squared statistic and p-value

阅读更多关于 Call R from JAVA to get Chi-squared statistic and p-value

问题 I have two 4*4 matrices in JAVA, where one matrix holds observed counts and the other expected counts. I need an automated way to calculate the p-value from the chi-square statistic between these two matrices; however, JAVA has no such function as far as I am aware. I can calculate the chi-square and its p-value by reading the two matrices into R as .csv file formats, and then using the chisq.test function as follows: obs<-read.csv("obs.csv") exp<-read.csv("exp.csv") chisq.test(obs,exp) where

Calculate Fisher's exact test p-value in dataframe rows

阅读更多关于 Calculate Fisher's exact test p-value in dataframe rows

问题 I have a list of 1700 samples in a data frame where every row represents the number of colorful items that every assistant has counted in a random number of specimens from different boxes. There are two available colors and two individuals counting the items so this could easily create a 2x2 contingency table. df Box-ID 1_Red 1_Blue 2_Red 2_Blue 1 1075 918 29 26 2 903 1076 135 144 I would like to know how can I treat every row as a contigency table (either vector or matrix) in order to

SSAS (Sexual Segregation and Aggregation Statistic) in R - calling C

阅读更多关于 SSAS (Sexual Segregation and Aggregation Statistic) in R - calling C

问题 I am running the following code, found in this appendix of a paper https://wiley.figshare.com/articles/Supplement_1_R_code_used_to_format_the_data_and_compute_the_SSAS_/3528698/1 to calculate the Sexual Segregation and Aggregation Statistic in R - but keep getting the following error - presumably there is an issue with calling a function from C, but I cannot resolve it. # Main function, computes both the SSAS (Sexual Segregation and # Aggregation Statistic) and the 95% limits of SSAS # under