binning

How to Plot a Pre-Binned Histogram In R

走远了吗. 提交于 2019-12-19 05:54:05
问题 I have a pre-binned frequency table for a rather large dataset. That is, a single column vector of bins and a single column vector of counts associated with those bins. I'd like R to plot a histogram of this data by doing further binning and summing the existing counts. For example, if in the pre-binned data I have something like [(0.01, 5000), (0.02, 231), (0.03, 948)], where the first number is the bin and the second is the count, and I choose 0.04 as the new bin width, I'd expect to get [

R: creating a categorical variable from a numerical variable and custom/open-ended/single-valued intervals

為{幸葍}努か 提交于 2019-12-19 00:24:32
问题 I often find myself trying to create a categorical variable from a numerical variable + a user-provided set of ranges. For instance, say that I have a data.frame with a numeric variable df$V and would like to create a new variable df$VCAT such that: df$VCAT = 0 if df$V is equal to 0 df$VCAT = 1 if df$V is between 0 to 10 (i.e. (0,10)) df$VCAT = 2 is df$V is equal to 10 (i.e. [10,10]) df$VCAT = 3 is df$V is between 10 to 20 (i.e. (10,20)) df$VCAT = 4 is df$V is greater or equal to than 20 (i.e

R: creating a categorical variable from a numerical variable and custom/open-ended/single-valued intervals

做~自己de王妃 提交于 2019-12-19 00:23:35
问题 I often find myself trying to create a categorical variable from a numerical variable + a user-provided set of ranges. For instance, say that I have a data.frame with a numeric variable df$V and would like to create a new variable df$VCAT such that: df$VCAT = 0 if df$V is equal to 0 df$VCAT = 1 if df$V is between 0 to 10 (i.e. (0,10)) df$VCAT = 2 is df$V is equal to 10 (i.e. [10,10]) df$VCAT = 3 is df$V is between 10 to 20 (i.e. (10,20)) df$VCAT = 4 is df$V is greater or equal to than 20 (i.e

assigning points to bins

别等时光非礼了梦想. 提交于 2019-12-18 11:35:53
问题 What is a good way to bin numerical values into a certain range? For example, suppose I have a list of values and I want to bin them into N bins by their range. Right now, I do something like this: from scipy import * num_bins = 3 # number of bins to use values = # some array of integers... min_val = min(values) - 1 max_val = max(values) + 1 my_bins = linspace(min_val, max_val, num_bins) # assign point to my bins for v in values: best_bin = min_index(abs(my_bins - v)) where min_index returns

Group/bin/bucket data in R and get count per bucket and sum of values per bucket

▼魔方 西西 提交于 2019-12-18 05:45:19
问题 I wish to bucket/group/bin data : C1 C2 C3 49488.01172 0.0512 54000 268221.1563 0.0128 34399 34775.96094 0.0128 54444 13046.98047 0.07241 61000 2121699.75 0.00453 78921 71155.09375 0.0181 13794 1369809.875 0.00453 12312 750 0.2048 43451 44943.82813 0.0362 49871 85585.04688 0.0362 18947 31090.10938 0.0362 13401 68550.40625 0.0181 14345 I want to bucket it by C2 values but I wish to define the buckets e.g. <=0.005, <=.010, <=.014 etc. As you can see, the bucketing will be uneven intervals. I

Ternary heatmap in R

回眸只為那壹抹淺笑 提交于 2019-12-18 03:45:12
问题 I'm trying to come up with a way of plotting a ternary heatmap using R. I think ggtern should be able todo the trick, but I don't know how to do a binning function like stat_bin in vanilla ggplot2. Here's What I have so far: require(ggplot2) require(ggtern) require(MASS) require(scales) palette <- c( "#FF9933", "#002C54", "#3375B2", "#CCDDEC", "#BFBFBF", "#000000") sig <- matrix(c(1,2,3,4),2,2) data <- data.frame(mvrnorm(n=10000, rep(2, 2), Sigma)) data$X1 <- data$X1/max(data$X1) data$X2 <-

Binning time data in R

若如初见. 提交于 2019-12-17 20:41:23
问题 I have time data for departures and arrivals of birds (e.g. arrival 17:23:54). I would like to bin the data into 2 hour time bins (e.g. 0:00:00-1:59:59...etc), so 12 total bins. The data would eventually go into a bar graph with time bins on the x axis and count on the y axis. Would package package ‘binr’ be my best bet? Thanks 回答1: Just use ?cut as it has a method for ?cut.POSIXt date/times. E.g.: x <- as.POSIXct("2016-01-01 00:00:00", tz="UTC") + as.difftime(30*(0:47),units="mins") cut(x,

2D and 3D Scatter Histograms from arrays in Python

对着背影说爱祢 提交于 2019-12-17 07:40:02
问题 have you any idea, how I can bin 3 arrays to a histogram. My arrays look like Temperature = [4, 3, 1, 4, 6, 7, 8, 3, 1] Radius = [0, 2, 3, 4, 0, 1, 2, 10, 7] Density = [1, 10, 2, 24, 7, 10, 21, 102, 203] And the 1D plot should look: Density | X 10^2-| X | X 10^1-| | X 10^0-| |___|___|___|___|___ Radius 0 3.3 6.6 10 And the 2D plot should (qualitative) look like: Density | 2 | | 10^2-| 11249 | | | 233 | | Radius 10^1-| 12 | | | 1 | | 10^0-| |___|___|___|___|___ Temperature 0 3 5 8 So I want to

2D and 3D Scatter Histograms from arrays in Python

六月ゝ 毕业季﹏ 提交于 2019-12-17 07:39:39
问题 have you any idea, how I can bin 3 arrays to a histogram. My arrays look like Temperature = [4, 3, 1, 4, 6, 7, 8, 3, 1] Radius = [0, 2, 3, 4, 0, 1, 2, 10, 7] Density = [1, 10, 2, 24, 7, 10, 21, 102, 203] And the 1D plot should look: Density | X 10^2-| X | X 10^1-| | X 10^0-| |___|___|___|___|___ Radius 0 3.3 6.6 10 And the 2D plot should (qualitative) look like: Density | 2 | | 10^2-| 11249 | | | 233 | | Radius 10^1-| 12 | | | 1 | | 10^0-| |___|___|___|___|___ Temperature 0 3 5 8 So I want to

R code to categorize age into group/ bins/ breaks

僤鯓⒐⒋嵵緔 提交于 2019-12-17 06:11:56
问题 I am trying to categorize age into group so it will not be continuous. I have this code: data$agegrp(data$age>=40 & data$age<=49) <- 3 data$agegrp(data$age>=30 & data$age<=39) <- 2 data$agegrp(data$age>=20 & data$age<=29) <- 1 the above code is not working under survival package. It's giving me: invalid function in complex assignment Can you point me where the error is? data is the dataframe I am using. 回答1: I would use findInterval() here: First, make up some sample data set.seed(1) ages <-