Counting values within levels

血红的双手。 提交于 2019-12-08 00:08:25

问题


I have a set of levels in R that I generate with cut, e.g. say fractional values between 0 and 1, broken down into 0.1 bins:

> frac <- cut(c(0, 1), breaks=10)
> levels(frac)
[1] "(-0.001,0.1]" "(0.1,0.2]"    "(0.2,0.3]"    "(0.3,0.4]"    "(0.4,0.5]"
[6] "(0.5,0.6]"    "(0.6,0.7]"    "(0.7,0.8]"    "(0.8,0.9]"    "(0.9,1]"

Given a vector v containing continuous values between [0.0, 1.0], how do I count the frequency of elements in v that fall within each level in levels(frac)?

I could customize the number of breaks and/or the interval from which I am making levels, so I'm looking for a way to do this with standard R commands, so that I can build a two-column data frame: one column for the levels as factors, and the second column for a fractional or percentage value of total elements in v over the level.

Note: The following does not work:

> table(frac)
frac
(-0.001,0.1]    (0.1,0.2]    (0.2,0.3]    (0.3,0.4]    (0.4,0.5]    (0.5,0.6]
           1            0            0            0            0            0
   (0.6,0.7]    (0.7,0.8]    (0.8,0.9]      (0.9,1]
           0            0            0            1

If I use cut on v directly, then I do not get the same levels when I run cut on different vectors, because the range of values — their minimum and maximum — is going to be different between arbitrary vectors, and so while I may have the same number of breaks, the level intervals will not be the same.

My goal is to take different vectors and bin them to the same set of levels. Hopefully this helps clarify my question. Thanks for any assistance.


回答1:


frac = seq(0,1,by=0.1)

ranges = paste(head(frac,-1), frac[-1], sep=" - ")
freq   = hist(v, breaks=frac, include.lowest=TRUE, plot=FALSE)

data.frame(range = ranges, frequency = freq$counts)



回答2:


Amend frac to actually represent your desired intervals, and then use the table function:

x = runif(100) # For example.
frac = cut(x, breaks = seq(0, 1, 0.1))
table(frac)

Result:

frac
  (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8]
       14         9         8        10         8        12         7         7
(0.8,0.9]   (0.9,1]
       16         9



回答3:


Introduce extremes c(0, 1) to v then use the same cut:

library(dplyr)

#dummy data
set.seed(1)
v <- round(runif(7), 2)

#result
data.frame(v,
           vFrac = cut(c(0, 1, v), breaks = 10)[-c(1, 2)]) %>% 
  group_by(vFrac) %>% 
  mutate(vFreq = n())

# Source: local data frame [10 x 3]
# Groups: vFrac [8]
# 
#        v        vFrac vFreq
#    <dbl>       <fctr> <int>
# 1   0.27    (0.2,0.3]     1
# 2   0.37    (0.3,0.4]     1
# 3   0.57    (0.5,0.6]     1
# 4   0.91      (0.9,1]     2
# 5   0.20    (0.1,0.2]     1
# 6   0.90    (0.8,0.9]     1
# 7   0.94      (0.9,1]     2



回答4:


Use findInterval instead of cut:

v<-data.frame(v=runif(100,0,1))

library(plyr)
v$x<-findInterval(v$v,seq(0,1,by=0.1))*0.1
ddply(v, .(x), summarize, n=length(x))



回答5:


frac = seq(0, 1, 0.1)
set.seed(42); v = rnorm(10, 0.5, 0.2)
sapply(1:(length(frac)-1), function(i) sum(frac[i]<v & frac[i+1]>=v))
#[1] 0 0 0 1 3 2 1 1 1 1


来源:https://stackoverflow.com/questions/42541994/counting-values-within-levels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!