chi-squared

Using loops to do Chi-Square Test in R

烂漫一生 提交于 2019-12-11 03:51:56
问题 I am new to R. I found the following code for doing univariate logistic regression for a set of variables. What i would like to do is run chi square test for a list of variables against the dependent variable, similar to the logistic regression code below. I found couple of them which involve creating all possible combinations of the variables, but I can't get it to work. Ideally, I want the one of the variables (X) to be the same. Chi Square Analysis using for loop in R lapply(c("age","sex",

How can I use scipy optimization to find the minimum chi-squared for 3 parameters and a list of data points?

穿精又带淫゛_ 提交于 2019-12-11 00:34:44
问题 I have a histogram of sorted random numbers and a Gaussian overlay. The histogram represents observed values per bin (applying this base case to a much larger dataset) and the Gaussian is an attempt to fit the data. Clearly, this Gaussian does not represent the best fit to the histogram. The code below is the formula for a Gaussian. normc, mu, sigma = 30.845, 50.5, 7 # normalization constant, avg, stdev gauss = lambda x: normc * exp( (-1) * (x - mu)**2 / ( 2 * (sigma **2) ) ) I calculated the

How SelectKBest (chi2) calculates score?

跟風遠走 提交于 2019-12-08 17:31:34
I am trying to find the most valuable features by applying feature selection methods to my dataset. Im using the SelectKBest function for now. I can generate the score values and sort them as I want, but I don't understand exactly how this score value is calculated. I know that theoretically high score is more valuable, but I need a mathematical formula or an example to calculate the score for learning this deeply. bestfeatures = SelectKBest(score_func=chi2, k=10) fit = bestfeatures.fit(dataValues, dataTargetEncoded) feat_importances = pd.Series(fit.scores_, index=dataValues.columns)

How SelectKBest (chi2) calculates score?

泄露秘密 提交于 2019-12-08 05:33:45
问题 I am trying to find the most valuable features by applying feature selection methods to my dataset. Im using the SelectKBest function for now. I can generate the score values and sort them as I want, but I don't understand exactly how this score value is calculated. I know that theoretically high score is more valuable, but I need a mathematical formula or an example to calculate the score for learning this deeply. bestfeatures = SelectKBest(score_func=chi2, k=10) fit = bestfeatures.fit

Chi-square testing for constraining a parameter

拜拜、爱过 提交于 2019-12-06 06:41:51
I have an important question about the use of chi^2 test to constrain a parameter in cosmology. I appreciate your help. Please do not give this question negative rate (this question is important to me). Assume we have a data file ( data.txt ) concluding 600 data and this data file has 3 columns, first column is redshift(z), second column is observational dL(m_obs) and third column is error(err). As we know chi^2 function is chi^2=(m_obs-m_theo)**2/err**2 #chi^2=sigma((m_obs-m_theo)**2/err**2) from 1 to N=600 All of thing that we must calculate is putting z from given data file into our

Python - Minimizing Chi-squared

老子叫甜甜 提交于 2019-12-06 06:37:02
问题 I have been trying to fit a linear model to a set of stress/strain data by minimizing chi-squared. Unfortunately using the code below is not correctly minimizing the chisqfunc function. It is finding the minimum at the initial conditions, x0 , which is not correct. I have looked through the scipy.optimize documentation and tested minimizing other functions which has worked correctly. Could you please suggest how to fix the code below or suggest another method I can use to fit a linear model

Sklearn Chi2 For Feature Selection

岁酱吖の 提交于 2019-12-06 04:02:59
问题 I'm learning about chi2 for feature selection and came across code like this However, my understanding of chi2 was that higher scores mean that the feature is more independent (and therefore less useful to the model) and so we would be interested in features with the lowest scores. However, using scikit learns SelectKBest, the selector returns the values with the highest chi2 scores. Is my understanding of using the chi2 test incorrect? Or does the chi2 score in sklearn produce something

SQL Query for Chi-SQUARE TEST [duplicate]

妖精的绣舞 提交于 2019-12-05 08:14:46
问题 This question already has an answer here : SQL Server Query to find CHI-SQUARE Values (Not Working) (1 answer) Closed 6 years ago . I am trying to find the CHI-SQUARE TEST on the following set of data in the table. I am trying my this Query to find the CHI-SQUARE TEST: SELECT sessionnumber, sessioncount, timespent, (dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected FROM (SELECT sessionnumber, SUM(cast(cnt as bigint)) as cnt FROM d3 GROUP BY sessionnumber) dim1 CROSS JOIN

Chi-squared test of independence on all combinations of columns in a dataframe in R

廉价感情. 提交于 2019-12-04 23:56:59
问题 this is my first time posting here and I hope this is all in the right place. I have been using R for basic statistical analysis for some time, but haven't really used it for anything computationally challenging and I'm very much a beginner in the programming/ data manipulation side of R. I have presence/absence (binary) data on 72 plant species in 323 plots in a single catchment. The dataframe is 323 rows, each representing a plot, with 72 columns, each representing a species. This is a

Pearson's Chi Square Test Python

六月ゝ 毕业季﹏ 提交于 2019-12-04 20:24:32
I have two arrays that I would like to do a Pearson's Chi Square test (goodness of fit). I want to test whether or not there is a significant difference between the expected and observed results. observed = [11294, 11830, 10820, 12875] expected = [10749, 10940, 10271, 11937] I want to compare 11294 with 10749, 11830 with 10940, 10820 with 10271, etc. Here's what I have >>> from scipy.stats import chisquare >>> chisquare(f_obs=[11294, 11830, 10820, 12875],f_exp=[10749, 10940, 10271, 11937]) (203.08897607453906, 9.0718379533890424e-44) where 203 is the chi square test statistic and 9.07e-44 is