binning

Matplotlib: How to make a histogram with bins of equal area?

半世苍凉 提交于 2019-12-06 01:49:42
问题 Given some list of numbers following some arbitrary distribution, how can I define bin positions for matplotlib.pyplot.hist() so that the area in each bin is equal to (or close to) some constant area, A? The area should be calculated by multiplying the number of items in the bin by the width of the bin and its value should be no greater than A. Here is a MWE to display a histogram with normally distributed sample data: import matplotlib.pyplot as plt import numpy as np x = np.random.randn(100

fit a function to a histogram created with frequency in gnuplot

末鹿安然 提交于 2019-12-05 17:48:12
Intro In gnuplot there's a solution to create histogram from file named hist.dat what likes 1 2 2 2 3 by using commands binwidth=1 set boxwidth binwidth bin(x,width)=width*floor(x/width) + binwidth/2.0 plot [0:5][0:*] "hist.dat" u (bin($1,binwidth)):(1.0) smooth freq with boxes that generates a histogram like this one from other SO page . Question How can I fit my function to this histogram? I defined a Gaussian function and initialized its values by f(x) = a*exp(-((x-m)/s)**2) a=3; m=2.5; s=1 and in the output the function follow the histogram well. Unfortunatelly I cannot fit to this

Binning longitude/latitude labeled data by census block ID

人走茶凉 提交于 2019-12-04 21:24:20
I have two data sets, one for crime in Chicago, labeled with longitude and latitude coords and a shapefile of census blocks also in Chicago. Is it possible in R to aggregate crimes within census blocks, given these two files? The purpose is to be able to map out the crimes by census block. Location for download of Chicago census tract data: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Census-Blocks-2000/uktd-fzhd Location for download of crime data: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 Some code that I have pruned down from

Binning time series in R?

試著忘記壹切 提交于 2019-12-04 19:08:39
I'm new to R. My data has 600k objects defined by three attributes: Id , Date and TimeOfCall . TimeofCall has a 00:00:00 format and range from 00:00:00 to 23:59:59 . I want to bin the TimeOfCall attribute, into 24 bins, each one representing hourly slot (first bin 00:00:00 to 00:59:59 and so on). Can someone talk me through how to do this? I tried using cut() but apparently my format is not numeric. Thanks in advance! While you could convert to a formal time representation, in this case it might be easier to just use substr : test <- c("00:00:01","02:07:01","22:30:15") as.numeric(substr(test,1

Python: how to make an histogram with equally *sized* bins

这一生的挚爱 提交于 2019-12-04 18:51:43
问题 I have a set of data, and want to make an histogram of it. I need the bins to have the same size , by which I mean that they must contain the same number of objects, rather than the more common (numpy.histogram) problem of having equally spaced bins. This will naturally come at the expenses of the bins widths, which can - and in general will - be different. I will specify the number of desired bins and the data set, obtaining the bins edges in return. Example: data = numpy.array([1., 1.2, 1.3

Is cut() style binning available in dplyr?

走远了吗. 提交于 2019-12-04 08:49:51
问题 Is there a way to do something like a cut() function for binning numeric values in a dplyr table? I'm working on a large postgres table and can currently either write a case statement in the sql at the outset, or output unaggregated data and apply cut() . Both have pretty obvious downsides... case statements are not particularly elegant and pulling a large number of records via collect() not at all efficient. 回答1: Just so there's an immediate answer for others arriving here via search engine,

Pandas - Group/bins of data per longitude/latitude

删除回忆录丶 提交于 2019-12-03 13:37:52
I have a bunch of geographical data as below. I would like to group the data by bins of .2 degrees in longitude AND .2 degree in latitude. While it is trivial to do for either latitude or longitude, what is the most appropriate of doing this for both variables? |User_ID |Latitude |Longitude|Datetime |u |v | |---------|----------|---------|-------------------|-----|-----| |222583401|41.4020375|2.1478710|2014-07-06 20:49:20|0.3 | 0.2 | |287280509|41.3671346|2.0793115|2013-01-30 09:25:47|0.2 | 0.7 | |329757763|41.5453577|2.1175164|2012-09-25 08:40:59|0.5 | 0.8 | |189757330|41.5844998|2.5621569

Python: how to make an histogram with equally *sized* bins

情到浓时终转凉″ 提交于 2019-12-03 13:03:47
I have a set of data, and want to make an histogram of it. I need the bins to have the same size , by which I mean that they must contain the same number of objects, rather than the more common (numpy.histogram) problem of having equally spaced bins. This will naturally come at the expenses of the bins widths, which can - and in general will - be different. I will specify the number of desired bins and the data set, obtaining the bins edges in return. Example: data = numpy.array([1., 1.2, 1.3, 2.0, 2.1, 2.12]) bins_edges = somefunc(data, nbins=3) print(bins_edges) >> [1.,1.3,2.1,2.12] So the

Binning of data along one axis in numpy

五迷三道 提交于 2019-12-03 06:55:45
I have a large two dimensional array arr which I would like to bin over the second axis using numpy. Because np.histogram flattens the array I'm currently using a for loop: import numpy as np arr = np.random.randn(100, 100) nbins = 10 binned = np.empty((arr.shape[0], nbins)) for i in range(arr.shape[0]): binned[i,:] = np.histogram(arr[i,:], bins=nbins)[0] I feel like there should be a more direct and more efficient way to do that within numpy but I failed to find one. You could use np.apply_along_axis : x = np.array([range(20), range(1, 21), range(2, 22)]) nbins = 2 >>> np.apply_along_axis

Numpy rebinning a 2D array

☆樱花仙子☆ 提交于 2019-12-03 06:52:15
I am looking for a fast formulation to do a numerical binning of a 2D numpy array. By binning I mean calculate submatrix averages or cumulative values. For ex. x = numpy.arange(16).reshape(4, 4) would have been splitted in 4 submatrix of 2x2 each and gives numpy.array([[2.5,4.5],[10.5,12.5]]) where 2.5=numpy.average([0,1,4,5]) etc... How to perform such an operation in an efficient way... I don't have really any ideay how to perform this ... Many thanks... You can use a higher dimensional view of your array and take the average along the extra dimensions: In [12]: a = np.arange(36).reshape(6,