statistics

How to count number of Numeric values in a column

橙三吉。 提交于 2021-02-06 09:08:21
问题 I have a dataframe, and I want to produce a table of summary statistics including number of valid numeric values, mean and sd by group for each of three columns. I can't seem to find any function to count the number of numeric values in R. I can use length() which tells me how many values there are, and I can use colSums(is.na(x)) to count the number of NA values, but colSums(is.numeric(x)) doesn't work the same way. I could use tapply with { length - number of NA values - number of blank

Mean, Median, and mode of a list of values (SCORE) given a certain zip code for every year

我们两清 提交于 2021-02-05 11:29:28
问题 I want to find the mean, median and mode value for each year given a specific ZIP code how can I achieve this, I already read the data from CSV file and convert it to json file and define it as DataFrame my data sample is not limited to the following table it's larger 回答1: Use SciPy.mstats: In [2295]: df.DATE = pd.to_datetime(df.DATE).dt.year In [2291]: import scipy.stats.mstats as mstats In [2313]: def mode(x): ...: return mstats.mode(x, axis=None)[0] ...: In [2314]: df.groupby(['DATE',

Obtaining the Uni-variate 95% confidence interval between two variables (MATLAB)

这一生的挚爱 提交于 2021-02-05 09:25:08
问题 I have Matrix A [1000,1] of variable A readings at 1000 locations, I also have Matrix B [1000,1] of variable B readings at the same 1000 locations as Variable A. I obtained the regression coefficient using the regress function and now I want to know if my regression coefficient passes the Uni-variate 95% confidence interval using Students t distribution. But I am finding it difficult 来源: https://stackoverflow.com/questions/62182588/obtaining-the-uni-variate-95-confidence-interval-between-two

Formula for Google Charts histogram

最后都变了- 提交于 2021-02-05 08:26:51
问题 What formula does Google Charts use to construct its histogram? For example, does it use Sturge's rule? Doane's rule? Scott's rule? etc. Is there any documentation on how it constructs it default bin size, min, and max? Here is a link to the Histogram page for Google Charts. Google Charts automatically chooses the number of bins for you. All bins are equal width and have a height proportional to the number of data points in the bin. In other respects, histograms are similar to column charts.

How can I compute a histogram in Haskell?

回眸只為那壹抹淺笑 提交于 2021-02-05 08:15:28
问题 I found Statistics.Sample.Histogram , but I can't seem to use it. If I want to be able to bin a list into four categories, I expect to be able to do something like this: import Statistics.Sample.Histogram histogram 4 [1, 2, 9, 9, 9, 9, 10, 11, 20] But it gives me the error "non type-variable argument in the constraint," which I don't understand at all. What am I doing wrong? 回答1: histogram takes a Vector of values, not a list. You can use Data.Vector 's fromList function to convert your list

How to manually compute the p-value of t-statistic in linear regression

让人想犯罪 __ 提交于 2021-02-04 17:36:09
问题 I did a linear regression for a two tailed t-test with 178 degrees of freedom. The summary function gives me two p-values for my two t-values. t value Pr(>|t|) 5.06 1.04e-06 *** 10.09 < 2e-16 *** ... ... F-statistic: 101.8 on 1 and 178 DF, p-value: < 2.2e-16 I want to calculate manually the p-value of the t-values with this formula: p = 1 - 2*F(|t|) p_value_1 <- 1 - 2 * pt(abs(t_1), 178) p_value_2 <- 1 - 2 * pt(abs(t_2), 178) I don't get the same p-values as in the model summary. Therefore, I

Find running minimum and Max in R

假装没事ソ 提交于 2021-02-04 08:28:04
问题 I have a vector of stock prices throughout the day: > head(bidStock) [,1] [1,] 1179.754 [2,] 1178.000 [3,] 1178.438 [4,] 1178.367 [5,] 1178.830 [6,] 1178.830 I want to find two things. As I the algorithm goes through the day. I want it to find how far the current point is from the historical minimum and maxim throughout the day. There is a function called 'mdd' in the 'stocks' package which finds the maximum draw down throughout the day (i.e. the lowest value which corresponds to a point

SAS Proc Freq with PySpark (Frequency, percent, cumulative frequency, and cumulative percent)

纵然是瞬间 提交于 2021-01-29 15:29:09
问题 I'm looking for a way to reproduce the SAS Proc Freq code in PySpark. I found this code that does exactly what I need. However, it is given in Pandas. I want to make sure it does use the best what Spark can offer, as the code will run with massive datasets. In this other post (which was also adapted for this StackOverflow answer), I also found instructions to compute distributed groupwise cumulative sums in PySpark, but not sure how to adapt it to my end. Here's an input and output example

How to convert percentage to z-score of normal distribution in C/C++?

强颜欢笑 提交于 2021-01-29 12:51:43
问题 The goal is to say: "These values lie within a band of 95 % of values around the mean in a normal distribution." Now, I am trying to convert percentage to z-score, so then I can get the precise range of values. Something like <lower bound , upper bound> would be enough. So I need something like double z_score(double percentage) { // ... } // ... // according to https://en.wikipedia.org/wiki/68–95–99.7_rule z_score(68.27) == 1 z_score(95.45) == 2 z_score(99.73) == 3 I found an article

Trying to use tidy for a power analysis and using clmm2

自闭症网瘾萝莉.ら 提交于 2021-01-29 12:35:49
问题 I'm trying to do a power analysis on a clmm2 analysis that I'm doing. This is the code for the particular statistical model: test <- clmm2(risk_sensitivity ~ treat + sex + dispersal + sex*dispersal + treat*dispersal + treat*sex,random = id, data = datasocial, Hess=TRUE) Now, I have the following function: sim_experiment_power <- function(rep) { s <- sim_experiment(n_sample = 1000, prop_disp = 0.10, prop_fem = 0.35, disp_probability = 0.75, nondisp_probability = 0.90, fem_probability = 0.75,