kernel-density

`plot.density` extends “xlim” beyond the range of my data. Why and how to fix it?

落花浮王杯 提交于 2019-12-02 05:28:33
Using the code below, I am trying to get density plot for different distributions. dens <- apply(df[,c(7,9,12,14,16,18)], 2, density) plot(NA, xlim=range(sapply(dens, "[", "x")), ylim=range(sapply(dens, "[", "y"))) mapply(lines, dens, col=1:length(dens)) legend("topright", legend=names(dens), fill=1:length(dens),bty = "n",lwd=1, cex=0.7) The maximum upper limit for all variables is 5. But I got lines exceeded the 5. What do I need to change in my code to fix the plot? By default, density will extend the range so that the density curve approaches 0 at the extreme. Do you want to restrict the

Using scipy.stats.gaussian_kde with 2 dimensional data

本小妞迷上赌 提交于 2019-11-30 09:30:36
I'm trying to use the scipy.stats.gaussian_kde class to smooth out some discrete data collected with latitude and longitude information, so it shows up as somewhat similar to a contour map in the end, where the high densities are the peak and low densities are the valley. I'm having a hard time putting a two-dimensional dataset into the gaussian_kde class. I've played around to figure out how it works with 1 dimensional data, so I thought 2 dimensional would be something along the lines of: from scipy import stats from numpy import array data = array([[1.1, 1.1], [1.2, 1.2], [1.3, 1.3]]) kde =

How would one use Kernel Density Estimation as a 1D clustering method in scikit learn?

余生长醉 提交于 2019-11-30 06:20:25
问题 I need to cluster a simple univariate data set into a preset number of clusters. Technically it would be closer to binning or sorting the data since it is only 1D, but my boss is calling it clustering, so I'm going to stick to that name. The current method used by the system I'm on is K-means, but that seems like overkill. Is there a better way of performing this task? Answers to some other posts are mentioning KDE (Kernel Density Estimation), but that is a density estimation method, how

Weighted Gaussian kernel density estimation in `python`

余生颓废 提交于 2019-11-30 05:15:44
It is currently not possible to use scipy.stats.gaussian_kde to estimate the density of a random variable based on weighted samples . What methods are available to estimate densities of continuous random variables based on weighted samples? Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below. An ipython notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann

The difference between geom_density in ggplot2 and density in base R

房东的猫 提交于 2019-11-29 15:20:56
I have a data in R like the following: bag_id location_type event_ts 2 155 sorter 2012-01-02 17:06:05 3 305 arrival 2012-01-01 07:20:16 1 155 transfer 2012-01-02 15:57:54 4 692 arrival 2012-03-29 09:47:52 10 748 transfer 2012-01-08 17:26:02 11 748 sorter 2012-01-08 17:30:02 12 993 arrival 2012-01-23 08:58:54 13 1019 arrival 2012-01-09 07:17:02 14 1019 sorter 2012-01-09 07:33:15 15 1154 transfer 2012-01-12 21:07:50 where class(event_ts) is POSIXct . I wanted to find the density of bags at each location in different times. I used the command geom_density(ggplot2) and I could plot it very nice. I

Tools to use for conditional density estimation in Python [closed]

江枫思渺然 提交于 2019-11-29 13:03:54
I have a large data set that contains 3 attributes per row: A,B,C Column A: can take the values 1, 2, and 0. Column B and C: can take any values. I'd like to perform density estimation using histograms for P(A = 2 | B,C) and plot the results using python. I do not need the code to do it, I can try and figure that on my own. I just need to know the procedures and the tools that should I use? To answer your over-all question, we should go through different steps and answer different questions: How to read csv file (or text data) ? How to filter data ? How to plot data ? At each stage, you need

Peak of the kernel density estimation

笑着哭i 提交于 2019-11-29 02:55:55
问题 I need to find as precisely as possible the peak of the kernel density estimation (modal value of the continuous random variable). I can find the approximate value: x<-rlnorm(100) d<-density(x) plot(d) i<-which.max(d$y) d$y[i] d$x[i] But when calculating d$y precise function is known. How can I locate the exact value of the mode? 回答1: Here are two functions for dealing with modes. The dmode function finds the mode with the highest peak (dominate mode) and n.modes identify the number of modes.

Multivariate kernel density estimation in Python

人盡茶涼 提交于 2019-11-28 23:40:54
I am trying to use SciPy's gaussian_kde function to estimate the density of multivariate data. In my code below I sample a 3D multivariate normal and fit the kernel density but I'm not sure how to evaluate my fit. import numpy as np from scipy import stats mu = np.array([1, 10, 20]) sigma = np.matrix([[4, 10, 0], [10, 25, 0], [0, 0, 100]]) data = np.random.multivariate_normal(mu, sigma, 1000) values = data.T kernel = stats.gaussian_kde(values) I saw this but not sure how to extend it to 3D. Also not sure how do I even begin to evaluate the fitted density? How do I visualize this? There are

Add density lines to histogram and cumulative histogram

无人久伴 提交于 2019-11-28 17:37:52
I want to add density curve to histogram and cumulative histogram, like this: Here is as far I can go: hist.cum <- function(x, plot=TRUE, ...){ h <- hist(x, plot=FALSE, ...) h$counts <- cumsum(h$counts) h$density <- cumsum(h$density) h$itensities <- cumsum(h$itensities) if(plot) plot(h) h } x <- rnorm(100, 15, 5) hist.cum(x) hist(x, add=TRUE, col="lightseagreen") # lines (density(x), add = TRUE, col="red") Offered without explanation: ## Make some sample data x <- sample(0:30, 200, replace=T, prob=15 - abs(15 - 0:30)) ## Calculate and plot the two histograms hcum <- h <- hist(x, plot=FALSE)

How would one use Kernel Density Estimation as a 1D clustering method in scikit learn?

对着背影说爱祢 提交于 2019-11-28 17:16:49
I need to cluster a simple univariate data set into a preset number of clusters. Technically it would be closer to binning or sorting the data since it is only 1D, but my boss is calling it clustering, so I'm going to stick to that name. The current method used by the system I'm on is K-means, but that seems like overkill. Is there a better way of performing this task? Answers to some other posts are mentioning KDE (Kernel Density Estimation), but that is a density estimation method, how would that work? I see how KDE returns a density, but how do I tell it to split the data into bins? How do