binning

How collect additional row data on binned data in R

偶尔善良 提交于 2019-12-02 23:24:12
问题 I want sort the values of one data.frame column into predetermined bins, and then sum values that are in the same rows, but a different column.What I'm trying to do is sort dataframe column items into bins based on one value and then get a sum of a second value attached to the items for all of the items in the bin. Can someone help me? My data looks like this df = Item valueX valueY A 169849631 0.9086560 B 27612064 0.9298379 C 196651878 1.6516654 D 33007984 1.3397873 E 23019448 -0.2954385 F

How does cut with breaks work in R

蹲街弑〆低调 提交于 2019-12-02 01:11:55
I am trying to understand how cut divides and creates intervals; tried ?cut but can't be able to figure out how cut in r works. Here is my problem: set.seed(111) data1 <- seq(1,10, by=1) data1 [1] 1 2 3 4 5 6 7 8 9 10 data1cut<- cut(data1, breaks = c(0,1,2,3,5,7,8,10), labels = FALSE) data1cut [1] 1 2 3 4 4 5 5 6 7 7 1. Why did 8,9,10 not included in data1cut result? 2. why did summary(data1) and summary(data1cut) produces different result? summary(data1) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 3.25 5.50 5.50 7.75 10.00 summary(data1cut) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 3.25 4.50

Create binned variable from results of class interval determination

最后都变了- 提交于 2019-12-01 05:53:23
I want to create a binned variable out of a continuous variable. I want 10 bins, with break points set from whatever results from a jenks classification. How do I assign each value to one of these 10 bins? # dataframe w/ values (AllwdAmt) df <- structure(list(X = c(2078L, 2079L, 2080L, 2084L, 2085L, 2086L, 2087L, 2092L, 2093L, 2094L, 2095L, 4084L, 4085L, 4086L, 4087L, 4088L, 4089L, 4091L, 4092L, 4093L, 4094L, 4095L, 4096L, 4097L, 4098L, 4099L, 4727L, 4728L, 4733L, 4734L, 4739L, 4740L, 4741L, 4742L, 4743L, 4744L, 4745L, 4746L, 4747L, 4748L, 4749L, 4750L, 4751L, 4752L, 4753L, 4754L, 4755L, 4756L

How to Plot a Pre-Binned Histogram In R

徘徊边缘 提交于 2019-12-01 03:22:46
I have a pre-binned frequency table for a rather large dataset. That is, a single column vector of bins and a single column vector of counts associated with those bins. I'd like R to plot a histogram of this data by doing further binning and summing the existing counts. For example, if in the pre-binned data I have something like [(0.01, 5000), (0.02, 231), (0.03, 948)], where the first number is the bin and the second is the count, and I choose 0.04 as the new bin width, I'd expect to get [(0.04, 6179)]. What's the fastest and or easiest way to do this in R? Looks like ggplot2 has the answer.

Howto bin series of float values into histogram in Python?

我们两清 提交于 2019-11-30 13:20:03
问题 I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150) The data I have looks like this: 0.000 0.005 0.124 0.000 0.004 0.000 0.111 0.112 Whith my code below I expect to get result that looks like [0, 0.005) 5 [0.005, 0.011) 0 ...etc.. I tried to do do such binning with this code of mine. But it doesn't seem to work. What's the right way to do it? #! /usr/bin/env python import fileinput, math log2 = math

Mathematica fast 2D binning algorithm

醉酒当歌 提交于 2019-11-30 12:15:48
问题 I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form T={{x1,y1,z1},{x2,y2,z2},....} and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin. Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a

resize with averaging or rebin a numpy 2d array

自闭症网瘾萝莉.ら 提交于 2019-11-30 10:53:14
I am trying to reimplement in python an IDL function: http://star.pst.qub.ac.uk/idl/REBIN.html which downsizes by an integer factor a 2d array by averaging. For example: >>> a=np.arange(24).reshape((4,6)) >>> a array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]) I would like to resize it to (2,3) by taking the mean of the relevant samples, the expected output would be: >>> b = rebin(a, (2, 3)) >>> b array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5]]) i.e. b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4]) and so on. I believe I should reshape

Pandas pd.cut() - binning datetime column / series

自作多情 提交于 2019-11-30 07:30:39
Attempting to do a bin using pd.cut() but it is fairly elaborate- A collegue sends me multiple files with report dates such as: '03-16-2017 to 03-22-2017' '03-23-2017 to 03-29-2017' '03-30-2017 to 04-05-2017' They are all combined into a single dataframe and given a column name, df['Filedate'] so that every record in the file has the correct filedate. The last day is a cutoff point, so I created a new column df['Filedate_bin'] which converts the last day to 3/22/2017, 3/29/2017, 4/05/2017 as a string. Then I created a list: Filedate_bin_list= df.Filedate_bin.unique(). As a result I have a

Howto bin series of float values into histogram in Python?

╄→гoц情女王★ 提交于 2019-11-30 07:18:45
I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150) The data I have looks like this: 0.000 0.005 0.124 0.000 0.004 0.000 0.111 0.112 Whith my code below I expect to get result that looks like [0, 0.005) 5 [0.005, 0.011) 0 ...etc.. I tried to do do such binning with this code of mine. But it doesn't seem to work. What's the right way to do it? #! /usr/bin/env python import fileinput, math log2 = math.log(2) def getBin(x): return int(math.log(x+1)/log2) diffCounts = [0] * 5 for line in fileinput.input(

assigning points to bins

蹲街弑〆低调 提交于 2019-11-30 03:42:04
What is a good way to bin numerical values into a certain range? For example, suppose I have a list of values and I want to bin them into N bins by their range. Right now, I do something like this: from scipy import * num_bins = 3 # number of bins to use values = # some array of integers... min_val = min(values) - 1 max_val = max(values) + 1 my_bins = linspace(min_val, max_val, num_bins) # assign point to my bins for v in values: best_bin = min_index(abs(my_bins - v)) where min_index returns the index of the minimum value. The idea is that you can find the bin the point falls into by seeing