binning | 易学教程

How collect additional row data on binned data in R

阅读更多关于 How collect additional row data on binned data in R

问题 I want sort the values of one data.frame column into predetermined bins, and then sum values that are in the same rows, but a different column.What I'm trying to do is sort dataframe column items into bins based on one value and then get a sum of a second value attached to the items for all of the items in the bin. Can someone help me? My data looks like this df = Item valueX valueY A 169849631 0.9086560 B 27612064 0.9298379 C 196651878 1.6516654 D 33007984 1.3397873 E 23019448 -0.2954385 F

How does cut with breaks work in R

阅读更多关于 How does cut with breaks work in R

I am trying to understand how cut divides and creates intervals; tried ?cut but can't be able to figure out how cut in r works. Here is my problem: set.seed(111) data1 <- seq(1,10, by=1) data1 [1] 1 2 3 4 5 6 7 8 9 10 data1cut<- cut(data1, breaks = c(0,1,2,3,5,7,8,10), labels = FALSE) data1cut [1] 1 2 3 4 4 5 5 6 7 7 1. Why did 8,9,10 not included in data1cut result? 2. why did summary(data1) and summary(data1cut) produces different result? summary(data1) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 3.25 5.50 5.50 7.75 10.00 summary(data1cut) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 3.25 4.50

Create binned variable from results of class interval determination

阅读更多关于 Create binned variable from results of class interval determination

I want to create a binned variable out of a continuous variable. I want 10 bins, with break points set from whatever results from a jenks classification. How do I assign each value to one of these 10 bins? # dataframe w/ values (AllwdAmt) df <- structure(list(X = c(2078L, 2079L, 2080L, 2084L, 2085L, 2086L, 2087L, 2092L, 2093L, 2094L, 2095L, 4084L, 4085L, 4086L, 4087L, 4088L, 4089L, 4091L, 4092L, 4093L, 4094L, 4095L, 4096L, 4097L, 4098L, 4099L, 4727L, 4728L, 4733L, 4734L, 4739L, 4740L, 4741L, 4742L, 4743L, 4744L, 4745L, 4746L, 4747L, 4748L, 4749L, 4750L, 4751L, 4752L, 4753L, 4754L, 4755L, 4756L

How to Plot a Pre-Binned Histogram In R

阅读更多关于 How to Plot a Pre-Binned Histogram In R

I have a pre-binned frequency table for a rather large dataset. That is, a single column vector of bins and a single column vector of counts associated with those bins. I'd like R to plot a histogram of this data by doing further binning and summing the existing counts. For example, if in the pre-binned data I have something like [(0.01, 5000), (0.02, 231), (0.03, 948)], where the first number is the bin and the second is the count, and I choose 0.04 as the new bin width, I'd expect to get [(0.04, 6179)]. What's the fastest and or easiest way to do this in R? Looks like ggplot2 has the answer.

Howto bin series of float values into histogram in Python?

阅读更多关于 Howto bin series of float values into histogram in Python?

问题 I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150) The data I have looks like this: 0.000 0.005 0.124 0.000 0.004 0.000 0.111 0.112 Whith my code below I expect to get result that looks like [0, 0.005) 5 [0.005, 0.011) 0 ...etc.. I tried to do do such binning with this code of mine. But it doesn't seem to work. What's the right way to do it? #! /usr/bin/env python import fileinput, math log2 = math

Mathematica fast 2D binning algorithm

阅读更多关于 Mathematica fast 2D binning algorithm

问题 I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form T={{x1,y1,z1},{x2,y2,z2},....} and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin. Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a

resize with averaging or rebin a numpy 2d array

阅读更多关于 resize with averaging or rebin a numpy 2d array

I am trying to reimplement in python an IDL function: http://star.pst.qub.ac.uk/idl/REBIN.html which downsizes by an integer factor a 2d array by averaging. For example: >>> a=np.arange(24).reshape((4,6)) >>> a array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]) I would like to resize it to (2,3) by taking the mean of the relevant samples, the expected output would be: >>> b = rebin(a, (2, 3)) >>> b array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5]]) i.e. b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4]) and so on. I believe I should reshape

Pandas pd.cut() - binning datetime column / series

阅读更多关于 Pandas pd.cut() - binning datetime column / series

Attempting to do a bin using pd.cut() but it is fairly elaborate- A collegue sends me multiple files with report dates such as: '03-16-2017 to 03-22-2017' '03-23-2017 to 03-29-2017' '03-30-2017 to 04-05-2017' They are all combined into a single dataframe and given a column name, df['Filedate'] so that every record in the file has the correct filedate. The last day is a cutoff point, so I created a new column df['Filedate_bin'] which converts the last day to 3/22/2017, 3/29/2017, 4/05/2017 as a string. Then I created a list: Filedate_bin_list= df.Filedate_bin.unique(). As a result I have a

Howto bin series of float values into histogram in Python?

阅读更多关于 Howto bin series of float values into histogram in Python?

I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150) The data I have looks like this: 0.000 0.005 0.124 0.000 0.004 0.000 0.111 0.112 Whith my code below I expect to get result that looks like [0, 0.005) 5 [0.005, 0.011) 0 ...etc.. I tried to do do such binning with this code of mine. But it doesn't seem to work. What's the right way to do it? #! /usr/bin/env python import fileinput, math log2 = math.log(2) def getBin(x): return int(math.log(x+1)/log2) diffCounts = [0] * 5 for line in fileinput.input(

assigning points to bins

阅读更多关于 assigning points to bins

What is a good way to bin numerical values into a certain range? For example, suppose I have a list of values and I want to bin them into N bins by their range. Right now, I do something like this: from scipy import * num_bins = 3 # number of bins to use values = # some array of integers... min_val = min(values) - 1 max_val = max(values) + 1 my_bins = linspace(min_val, max_val, num_bins) # assign point to my bins for v in values: best_bin = min_index(abs(my_bins - v)) where min_index returns the index of the minimum value. The idea is that you can find the bin the point falls into by seeing