binning

Python: Checking to which bin a value belongs

半腔热情 提交于 2019-11-27 16:18:06
问题 I have a list of values and a list of bin edges. Now I need to check for all values to what bin they belong to. Is there a more pythonic way than iterating over the values and then over the bins and checking if the value belongs to the current bin, like: my_list = [3,2,56,4,32,4,7,88,4,3,4] bins = [0,20,40,60,80,100] for i in my_list: for j in range(len(bins)): if bins(j) < i < bins(j+1): DO SOMETHING This doesn't look very pretty to me. Thanks! 回答1: Probably too late, but for future

What is the fastest way to count elements in an array?

本秂侑毒 提交于 2019-11-27 14:29:36
In my models, one of the most repeated tasks to be done is counting the number of each element within an array. The counting is from a closed set, so I know there are X types of elements, and all or some of them populate the array, along with zeros that represent 'empty' cells. The array is not sorted in any way, and could by quite long (about 1M elements), and this task is done thousands of times during one simulation (which is also part of hundreds of simulations). The result should be a vector r of size X , so r(k) is the amount of k in the array. Example: For X = 9 , if I have the

Better binning in pandas

烈酒焚心 提交于 2019-11-27 13:55:53
问题 I've got a data frame and want to filter or bin by a range of values and then get the counts of values in each bin. Currently, I'm doing this: x = 5 y = 17 z = 33 filter_values = [x, y, z] filtered_a = df[df.filtercol <= x] a_count = filtered_a.filtercol.count() filtered_b = df[df.filtercol > x] filtered_b = filtered_b[filtered_b <= y] b_count = filtered_b.filtercol.count() filtered_c = df[df.filtercol > y] c_count = filtered_c.filtercol.count() But is there a more concise way to accomplish

Reduce number of levels for large categorical variables

删除回忆录丶 提交于 2019-11-27 07:29:52
问题 Are there some ready to use libraries or packages for python or R to reduce the number of levels for large categorical factors? I want to achieve something similar to R: "Binning" categorical variables but encode into the most frequently top-k factors and "other". 回答1: Here is an example in R using data.table a bit, but it should be easy without data.table also. # Load data.table require(data.table) # Some data set.seed(1) dt <- data.table(type = factor(sample(c("A", "B", "C"), 10e3, replace

How to bin column of floats with pandas

时光毁灭记忆、已成空白 提交于 2019-11-27 07:21:01
问题 This code was working until I upgrade my python 2.x to 3.x. I have a df consisting of 3 columns ipk1, ipk2, ipk3. ipk1, ipk2, ipk3 consisting of float numbers 0 - 4.0, I would like to bin them into string. The data looks something like this: ipk1 ipk2 ipk3 ipk4 ipk5 jk 0 3.25 3.31 3.31 3.31 3.34 P 1 3.37 3.33 3.36 3.33 3.41 P 2 3.41 3.47 3.59 3.55 3.60 P 3 3.23 3.10 3.05 2.98 2.97 L 4 3.24 3.40 3.22 3.23 3.25 L on python 2.x this code works but after I upgrade it into python 3 it isn't. Is

R code to categorize age into group/ bins/ breaks

旧城冷巷雨未停 提交于 2019-11-27 04:32:49
I am trying to categorize age into group so it will not be continuous. I have this code: data$agegrp(data$age>=40 & data$age<=49) <- 3 data$agegrp(data$age>=30 & data$age<=39) <- 2 data$agegrp(data$age>=20 & data$age<=29) <- 1 the above code is not working under survival package. It's giving me: invalid function in complex assignment Can you point me where the error is? data is the dataframe I am using. A5C1D2H2I1M1N2O1R2T1 I would use findInterval() here: First, make up some sample data set.seed(1) ages <- floor(runif(20, min = 20, max = 50)) ages # [1] 27 31 37 47 26 46 48 39 38 21 26 25 40

Define and apply custom bins on a dataframe

社会主义新天地 提交于 2019-11-26 19:56:39
Using python I have created following data frame which contains similarity values: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 6 0.029 0.302 0.558 0.08547009 0.3928234 0.4603175 0.00000000 I am trying to write a R script to generate another data frame that

Getting data for histogram plot

巧了我就是萌 提交于 2019-11-26 10:07:13
问题 Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query: select total, count(total) from faults GROUP BY total; The data that is being generated is good enough but there are just too many rows. What I need is a way to group the data into predefined bins. I can do this from a scripting language, but is there a way to do it directly in SQL? Example: +-------+--------------+ | total | count(total) | +-------+--------------+ | 30 | 1 | | 31 | 2 | | 33 | 1 | |

Define and apply custom bins on a dataframe

限于喜欢 提交于 2019-11-26 05:27:32
问题 Using python I have created following data frame which contains similarity values: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 6 0.029 0.302 0.558 0

Pandas: convert categories to numbers

霸气de小男生 提交于 2019-11-26 01:59:21
问题 Suppose I have a dataframe with countries that goes as: cc | temp US | 37.0 CA | 12.0 US | 35.0 AU | 20.0 I know that there is a pd.get_dummies function to convert the countries to \'one-hot encodings\'. However, I wish to convert them to indices instead such that I will get cc_index = [1,2,1,3] instead. I\'m assuming that there is a faster way than using the get_dummies along with a numpy where clause as shown below: [np.where(x) for x in df.cc.get_dummies().values] This is somewhat easier