binning

R - faster alternative to hist(XX, plot=FALSE)$count

我怕爱的太早我们不能终老 提交于 2019-12-22 11:10:14
问题 I am on the lookout for a faster alternative to R's hist(x, breaks=XXX, plot=FALSE)$count function as I don't need any of the other output that is produced (as I want to use it in an sapply call, requiring 1 million iterations in which this function would be called), e.g. x = runif(100000000, 2.5, 2.6) bincounts = hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count Any thoughts? 回答1: A first attempt using table and cut : table(cut(x, breaks=seq(0,3,length.out=100))) It avoids the extra

Applying my custom function to a data frame python

最后都变了- 提交于 2019-12-22 10:26:14
问题 I have a dataframe with a column called Signal. I want to add a new column to that dataframe and apply a custom function i've built. I'm very new at this and I seem to be having trouble when it comes to passing values that I'm getting out of a data frame column into a function so any help as to my syntax errors or reasoningg would be greatly appreciated! Signal 3.98 3.78 -6.67 -17.6 -18.05 -14.48 -12.25 -13.9 -16.89 -13.3 -13.19 -18.63 -26.36 -26.23 -22.94 -23.23 -15.7 This is my simple

Hexbin: apply function for every bin

你说的曾经没有我的故事 提交于 2019-12-21 17:29:00
问题 I would like to build the hexbin plot where for every bin is the "ratio between class 1 and class2 points falling into this bin" is plotted (either log or not). x <- rnorm(10000) y <- rnorm(10000) h <- hexbin(x,y) plot(h) l <- as.factor(c( rep(1,2000), rep(2,8000) )) Any suggestions on how to implement this? Is there a way to introduce function to every bin based on bin statistics? 回答1: @cryo111's answer has the most important ingredient - IDs = TRUE . After that it's just a matter of

Binning data into a hexagonal grid in Google Maps

China☆狼群 提交于 2019-12-21 02:53:20
问题 I'm trying to display geospatial data in a hexagonal grid on a Google Map. In order to do so, given a hexagon tile grid size X I need to be able to convert ( {lat, lng} ) coordinates into the ( {lat, lng} ) centers of the hexagon grid tiles that contain them. In the end, I would like to be able to display data on a Google Map like this: Does anybody have any insight into how this is done? I've tried porting this Python hexagon binning script, binner.py to Javascript but it doesn't seem to be

Binning of data along one axis in numpy

走远了吗. 提交于 2019-12-20 19:37:19
问题 I have a large two dimensional array arr which I would like to bin over the second axis using numpy. Because np.histogram flattens the array I'm currently using a for loop: import numpy as np arr = np.random.randn(100, 100) nbins = 10 binned = np.empty((arr.shape[0], nbins)) for i in range(arr.shape[0]): binned[i,:] = np.histogram(arr[i,:], bins=nbins)[0] I feel like there should be a more direct and more efficient way to do that within numpy but I failed to find one. 回答1: You could use np

Numpy rebinning a 2D array

社会主义新天地 提交于 2019-12-20 19:07:10
问题 I am looking for a fast formulation to do a numerical binning of a 2D numpy array. By binning I mean calculate submatrix averages or cumulative values. For ex. x = numpy.arange(16).reshape(4, 4) would have been splitted in 4 submatrix of 2x2 each and gives numpy.array([[2.5,4.5],[10.5,12.5]]) where 2.5=numpy.average([0,1,4,5]) etc... How to perform such an operation in an efficient way... I don't have really any ideay how to perform this ... Many thanks... 回答1: You can use a higher

Numpy rebinning a 2D array

此生再无相见时 提交于 2019-12-20 19:06:26
问题 I am looking for a fast formulation to do a numerical binning of a 2D numpy array. By binning I mean calculate submatrix averages or cumulative values. For ex. x = numpy.arange(16).reshape(4, 4) would have been splitted in 4 submatrix of 2x2 each and gives numpy.array([[2.5,4.5],[10.5,12.5]]) where 2.5=numpy.average([0,1,4,5]) etc... How to perform such an operation in an efficient way... I don't have really any ideay how to perform this ... Many thanks... 回答1: You can use a higher

Create binned variable from results of class interval determination

廉价感情. 提交于 2019-12-19 08:04:32
问题 I want to create a binned variable out of a continuous variable. I want 10 bins, with break points set from whatever results from a jenks classification. How do I assign each value to one of these 10 bins? # dataframe w/ values (AllwdAmt) df <- structure(list(X = c(2078L, 2079L, 2080L, 2084L, 2085L, 2086L, 2087L, 2092L, 2093L, 2094L, 2095L, 4084L, 4085L, 4086L, 4087L, 4088L, 4089L, 4091L, 4092L, 4093L, 4094L, 4095L, 4096L, 4097L, 4098L, 4099L, 4727L, 4728L, 4733L, 4734L, 4739L, 4740L, 4741L,

Create binned variable from results of class interval determination

假如想象 提交于 2019-12-19 08:04:21
问题 I want to create a binned variable out of a continuous variable. I want 10 bins, with break points set from whatever results from a jenks classification. How do I assign each value to one of these 10 bins? # dataframe w/ values (AllwdAmt) df <- structure(list(X = c(2078L, 2079L, 2080L, 2084L, 2085L, 2086L, 2087L, 2092L, 2093L, 2094L, 2095L, 4084L, 4085L, 4086L, 4087L, 4088L, 4089L, 4091L, 4092L, 4093L, 4094L, 4095L, 4096L, 4097L, 4098L, 4099L, 4727L, 4728L, 4733L, 4734L, 4739L, 4740L, 4741L,

How to Plot a Pre-Binned Histogram In R

本秂侑毒 提交于 2019-12-19 05:54:34
问题 I have a pre-binned frequency table for a rather large dataset. That is, a single column vector of bins and a single column vector of counts associated with those bins. I'd like R to plot a histogram of this data by doing further binning and summing the existing counts. For example, if in the pre-binned data I have something like [(0.01, 5000), (0.02, 231), (0.03, 948)], where the first number is the bin and the second is the count, and I choose 0.04 as the new bin width, I'd expect to get [