binning | 易学教程

pd.qcut with values that are inf (infinity) ValueError: Bin edges must be unique:

阅读更多关于 pd.qcut with values that are inf (infinity) ValueError: Bin edges must be unique:

问题 I have a data set that is a ratio of 2 float type numbers. Some values have inf for infinity (divide by zero) situation. How do I work with pd.qcut/pd.cut with inf values? My data can be accessed here. q = pd.qcut(df['ratio'], 10) ValueError: Bin edges must be unique: array([ 1.20089207e+03, 6.02984295e+04, 1.26445577e+05, 2.29982770e+05, 5.13176079e+05, 1.28794976e+06, 4.96001538e+06, nan, nan, nan, inf]) 回答1: you could replace the np.inf with np.nan then dropna q = pd.qcut(df.ratio.replace

Conditionally binning

阅读更多关于 Conditionally binning

问题 Is it possible to create a new column in a dataframe where the bins for 'X' are based on a value of another column(s). Example below. The bins for AR1, PO1 and RU1 are different from one another. Until now I can only get bins for all values in 'X'. import pandas as pd import numpy as np import string import random N = 100 J = [2012,2013,2014] K = ['A','B','C','D','E','F','G','H'] L = ['h','d','a'] S = ['AR1','PO1','RU1'] np.random.seed(0) df = pd.DataFrame( {'X': np.random.uniform(1,10,N), 'Y

Pandas - Group/bins of data per longitude/latitude

阅读更多关于 Pandas - Group/bins of data per longitude/latitude

问题 I have a bunch of geographical data as below. I would like to group the data by bins of .2 degrees in longitude AND .2 degree in latitude. While it is trivial to do for either latitude or longitude, what is the most appropriate of doing this for both variables? |User_ID |Latitude |Longitude|Datetime |u |v | |---------|----------|---------|-------------------|-----|-----| |222583401|41.4020375|2.1478710|2014-07-06 20:49:20|0.3 | 0.2 | |287280509|41.3671346|2.0793115|2013-01-30 09:25:47|0.2 | 0

Binning variables in a dataframe with input bin data from another dataframe

阅读更多关于 Binning variables in a dataframe with input bin data from another dataframe

问题 Being a beginner-level user of R , despite having read (1) numerous posts about binning&grouping here at SO, and (2) documentation on data.table and dplyr packages, I still can't figure out how to apply the power of those packages for binning continuous&factor variables, for further use in credit scoring modelling. Problem: To build a code-efficient, easily-customisable, more or less automated solution for binning variables with minimal hard-coding. These variables used to be binned with a

How to catch the index of immediate greater number in other matrix?

阅读更多关于 How to catch the index of immediate greater number in other matrix?

问题 Consider example a=rand(5,1) b=rand(5,1); bs=sum(b); B=b./bs; cB=cumsum(B) %OUTPUT a = 0.7803 0.3897 0.2417 0.4039 0.0965 cB = 0.0495 0.4030 0.7617 0.9776 1.0000 now i want the position of the number in cB which is immediately greater than the number in a. that is to say i want 5 positions corresponding to each number in a. So my output should be P= [4;2;2;3;2] Please help. 回答1: The suggestions by the others are decent, but both miss the point as they are inefficient for large problems. This

Convert year-month string column into quarterly bins

阅读更多关于 Convert year-month string column into quarterly bins

问题 I am currently working with a large phenology data set, where there are multiple observations of trees for a given month. I want to assign these observations into three month clusters or bins. I am currently using the following code: Cluster.GN <- ifelse(Master.feed.parts.gn$yr.mo=="2007.1", 1, ifelse(Master.feed.parts.gn$yr.mo=="2007.11", 1,.... ifelse(Master.feed.parts.gn$yr.mo=="2014.05", 17, NA) This code works, but it is very cumbersome as there are over 50 months. I have had trouble

How to bin x,y,z vectors into matrix (R)

阅读更多关于 How to bin x,y,z vectors into matrix (R)

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . This is probably a basic question, but I haven't been able to Google anything helpful after trying for days. I have an R dataframe with x,y,z tuples, where z is a response to x and y and can be modeled as a surface. > head(temp) x y z 1 36.55411 965.7779 1644.779 2 42.36912 978.9721 1643.957 3 58.34699 1183.7426 1846.123 4 53.55439 1232.2696 1990.707 5 50.76167 1115.2049 1281

How to handle NaNs in binning with numpy add.reduceat?

阅读更多关于 How to handle NaNs in binning with numpy add.reduceat?

问题 I'm using the numpy reduceat method for binning data. Background: I'm processing measurement data sampled at high frequencies and I need to down-sample them by extracting bin means from bins of a certain size. Since I have millions of samples, I need something fast. In principle, this works like a charm: import numpy as np def bin_by_npreduceat(v, nbins): bins = np.linspace(0, len(v), nbins+1, True).astype(np.int) return np.add.reduceat(v, bins[:-1]) / np.diff(bins) The Problem is: NaNs can

How do I get identical results from the old hist and the new histcounts functions in Matlab

阅读更多关于 How do I get identical results from the old hist and the new histcounts functions in Matlab

问题 I am trying to replace a use of the old hist function by the new histcounts which performs faster at binning and counting than hist. However, I am struggling to achieve the exact same results from histcounts that I got from hist. I am aware that histcounts returns the bin edges rather than the binCenters. However, the counts should be identical and the bin edges should be convertible to bin centers, as far as I understand. The Matlab reference page for replacing hist with histcounts (see http

Convert year-month string to three month bins with gaps - how to assign contiguous ascending values?

阅读更多关于 Convert year-month string to three month bins with gaps - how to assign contiguous ascending values?

问题 I have used the code below to "bin" a year.month string into three month bins. The problem is that I want each of the bins to have a number that corresponds where the bin occurs chronologically (i.e. first bin =1, second bin=2, etc.). Right now, the first month bin is assigned to the number 4, and I am not sure why. Any help would be highly appreciated! > head(Master.feed.parts.gn$yr.mo, n=20) [1] "2007.10" "2007.10" "2007.10" "2007.11" "2007.11" "2007.11" "2007.11" "2007.12" "2008.01" [10]