statistics | 易学教程

Marginalize a surface plot and use kernel density estimation (kde) on it

阅读更多关于 Marginalize a surface plot and use kernel density estimation (kde) on it

问题 As a minimal reproducible example, suppose I have the following multivariate normal distribution: import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from scipy.stats import multivariate_normal, gaussian_kde # Choose mean vector and variance-covariance matrix mu = np.array([0, 0]) sigma = np.array([[2, 0], [0, 3]]) # Create surface plot data x = np.linspace(-5, 5, 100) y = np.linspace(-5, 5, 100) X, Y = np.meshgrid(x, y) rv = multivariate_normal(mean=mu,

Error in fitdist with gamma distribution

阅读更多关于 Error in fitdist with gamma distribution

问题 Below are my codes: library(fitdistrplus) s <- c(11, 4, 2, 9, 3, 1, 2, 2, 3, 2, 2, 5, 8,3, 15, 3, 9, 22, 0, 4, 10, 1, 9, 10, 11, 2, 8, 2, 6, 0, 15, 0 , 2, 11, 0, 6, 3, 5, 0, 7, 6, 0, 7, 1, 0, 6, 4, 1, 3, 5, 2, 6, 0, 10, 6, 4, 1, 17, 0, 1, 0, 6, 6, 1, 5, 4, 8, 0, 1, 1, 5, 15, 14, 8, 1, 3, 2, 9, 4, 4, 1, 2, 18, 0, 0, 10, 5, 0, 5, 0, 1, 2, 0, 5, 1, 1, 2, 3, 7) o <- fitdist(s, "gamma", method = "mle") summary(o) plot(o) and the error says: Error in fitdist(s, "gamma", method = "mle") : the

ggplot2 density of circular data

阅读更多关于 ggplot2 density of circular data

问题 I have a data set where x represents day of year (say birthdays) and I want to create a density graph of this. Further, since I have some grouping information (say boys or girls), I want to use the capabilities of ggplot2 to make a density plot. Easy enough at first: require(ggplot2); require(dplyr) bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T)) bdays %>% ggplot(aes(x = bday)) + geom_density(aes(color = factor(gender))) However,

ggplot2 density of circular data

阅读更多关于 ggplot2 density of circular data

Scipy: Pearson's correlation always returning 1

阅读更多关于 Scipy: Pearson's correlation always returning 1

问题 I am using Python library scipy to calculate Pearson's correlation for two float arrays. The returned value for coefficient is always 1.0, even if the arrays are different. For example: [-0.65499887 2.34644428] [-1.46049758 3.86537321] I am calling the routine in this way: r_row, p_value = scipy.stats.pearsonr(array1, array2) The value of r_row is always 1.0. What am I doing wrong? 回答1: Pearson's correlation coefficient is a measure of how well your data would be fitted by a linear regression

Linear regression:ValueError: all the input array dimensions except for the concatenation axis must match exactly

阅读更多关于 Linear regression:ValueError: all the input array dimensions except for the concatenation axis must match exactly

问题 I am looking for a solution for the following problem and it just won't work the way I want to. So my goal is to calculate a regression analysis and get the slope, intercept, rvalue, pvalue and stderr for multiple rows (this could go up to 10000). In this example, I have a file with 15 rows. Here are the first two rows: array([ [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], [ 100, 10, 61, 55, 29, 77, 61, 42, 70, 73, 98, 62, 25, 86, 49, 68, 68, 26, 35

How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

阅读更多关于 How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

问题 In genetics very small p-values are common (for example 10^-400), and I am looking for a way to get very small p-values (two-tailed) when the z-score is large in R, for example: z=40 pvalue = 2*pnorm(abs(z), lower.tail = F) This gives me a zero instead of a very small value which is very significant. 回答1: The inability to handle p-values less than about 10^(-308) ( .Machine$double.xmin ) is not really R's fault, but is rather a generic limitation of any computational system that uses double

How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

阅读更多关于 How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

calculating mean and standard deviation of the data which does not fit in memory using python [duplicate]

阅读更多关于 calculating mean and standard deviation of the data which does not fit in memory using python [duplicate]

问题 This question already has answers here : How to efficiently calculate a running standard deviation? (15 answers) Closed 7 years ago . I have a lot of data stored at disk in large arrays. I cant load everything in memory altogether. How one could calculate the mean and the standard deviation? 回答1: There is a simple online algorithm that computes both the mean and the variance by looking at each datapoint once and using O(1) memory. Wikipedia offers the following code: def online_variance(data)

calculating mean and standard deviation of the data which does not fit in memory using python [duplicate]

阅读更多关于 calculating mean and standard deviation of the data which does not fit in memory using python [duplicate]