statistics | 易学教程

Anderson Darling Test in C++

阅读更多关于 Anderson Darling Test in C++

问题 I am trying to compute the Anderson-Darling test found here. I followed the steps on Wikipedia and made sure that when I calculate the average and standard deviation of the data I am testing denoted X by using MATLAB. Also, I used a function called phi for computing the standard normal CDF, I have also tested this function to make sure it is correct which it is. Now I seem to have a problem when I actually compute the A-squared (denoted in Wikipedia, I denote it as A in C++). Here is my

Difference between numpy var() and pandas var()

阅读更多关于 Difference between numpy var() and pandas var()

问题 I recently encountered a thing which made me notice that numpy.var() and pandas.DataFrame.var() or pandas.Series.var() are giving different values. I want to know if there is any difference between them or not? Here is my dataset. Country GDP Area Continent 0 India 2.79 3.287 Asia 1 USA 20.54 9.840 North America 2 China 13.61 9.590 Asia Here is my code: from sklearn.preprocessing import StandardScaler ss = StandardScaler() catDf.iloc[:,1:-1] = ss.fit_transform(catDf.iloc[:,1:-1]) Now checking

Anderson Darling Test in C++

阅读更多关于 Anderson Darling Test in C++

I am using rational regression to fit my data, how do I know what polynomials to divide?(What function to use?)

阅读更多关于 I am using rational regression to fit my data, how do I know what polynomials to divide?(What function to use?)

问题 I have a set of data: 10.28;3.615758755 60.12;3.409846973 87.24;2.360958276 92.37;2.288513587 130.87;1.940551693 164.01;1.770745686 215.87;1.60957984 245.42;1.548268275 251.26;1.53780944 252.14;1.536289363 261.74;1.520210896 384.91;1.385778494 458.68;1.339844772 492.59;1.323331777 600.94;1.281642094 6480.17;1.116976869 849.37;1.229511285 941.5;1.216845459 1280.98;1.185881122 1395.94;1.178804247 1470.04;1.180831814 1500.85;1.179158477 1861.04;1.15910996 2882.22;1.138164332 2997.18;1.136701833

Calculating loglikelihood of distributions in Python

阅读更多关于 Calculating loglikelihood of distributions in Python

问题 What is an easy way to calculate the loglikelihood of any distribution fitted to data? 回答1: Solution by OP. Python has 82 standard distributions which can be found here and in scipy.stats.distributions Suppose you find the parameters such that the probability density function(pdf) fits the data as follows: dist = getattr(stats.stats, 'distribution name') params = dist.fit(data) Then since it is a standard distribution included in the SciPy library, the pdf and logpdf can be found and used

How to determine how much CPU load is produced from processes running under some user in Linux? [closed]

阅读更多关于 How to determine how much CPU load is produced from processes running under some user in Linux? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 9 years ago . Improve this question I would like to make a simple monitoring script that will record CPU load produced by user "abc" in a text file. vmstat, iostat, mpstat and free do not seem to have a capability to filter based on user name. Is it possible at all? EDIT : Btw. I'm running on Red Hat EL 6.0. 回答1: A simple way

How to determine how much CPU load is produced from processes running under some user in Linux? [closed]

阅读更多关于 How to determine how much CPU load is produced from processes running under some user in Linux? [closed]

Faceted qqplots with ggplot2

阅读更多关于 Faceted qqplots with ggplot2

问题 Say I have the following data: datapoints1 = data.frame(categ=c(rep(1, n), rep(2, n)), vals1=c(rt(n, 1, 2), rnorm(n, 3, 4))) datapoints2 = data.frame(categ=c(rep(1, n), rep(2, n)), vals2=c(rt(n, 5, 6), rnorm(n, 7, 8))) Using ggplot2 , how can I use the facet functionality to create in a single command two QQplots, i.e. one with the two t samples, the other with the two Gaussian samples? 回答1: First, combine both data frames: dat <- cbind(datapoints1, vals2 = datapoints2[ , 2]) Then, sort the

Marginalize a surface plot and use kernel density estimation (kde) on it

阅读更多关于 Marginalize a surface plot and use kernel density estimation (kde) on it

问题 As a minimal reproducible example, suppose I have the following multivariate normal distribution: import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from scipy.stats import multivariate_normal, gaussian_kde # Choose mean vector and variance-covariance matrix mu = np.array([0, 0]) sigma = np.array([[2, 0], [0, 3]]) # Create surface plot data x = np.linspace(-5, 5, 100) y = np.linspace(-5, 5, 100) X, Y = np.meshgrid(x, y) rv = multivariate_normal(mean=mu,

Marginalize a surface plot and use kernel density estimation (kde) on it

阅读更多关于 Marginalize a surface plot and use kernel density estimation (kde) on it