问题
I have a set of levels in R that I generate with cut
, e.g. say fractional values between 0 and 1, broken down into 0.1 bins:
> frac <- cut(c(0, 1), breaks=10)
> levels(frac)
[1] "(-0.001,0.1]" "(0.1,0.2]" "(0.2,0.3]" "(0.3,0.4]" "(0.4,0.5]"
[6] "(0.5,0.6]" "(0.6,0.7]" "(0.7,0.8]" "(0.8,0.9]" "(0.9,1]"
Given a vector v
containing continuous values between [0.0, 1.0]
, how do I count the frequency of elements in v
that fall within each level in levels(frac)
?
I could customize the number of breaks and/or the interval from which I am making levels, so I'm looking for a way to do this with standard R commands, so that I can build a two-column data frame: one column for the levels as factors, and the second column for a fractional or percentage value of total elements in v
over the level.
Note: The following does not work:
> table(frac)
frac
(-0.001,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6]
1 0 0 0 0 0
(0.6,0.7] (0.7,0.8] (0.8,0.9] (0.9,1]
0 0 0 1
If I use cut
on v
directly, then I do not get the same levels when I run cut
on different vectors, because the range of values — their minimum and maximum — is going to be different between arbitrary vectors, and so while I may have the same number of breaks, the level intervals will not be the same.
My goal is to take different vectors and bin them to the same set of levels. Hopefully this helps clarify my question. Thanks for any assistance.
回答1:
frac = seq(0,1,by=0.1)
ranges = paste(head(frac,-1), frac[-1], sep=" - ")
freq = hist(v, breaks=frac, include.lowest=TRUE, plot=FALSE)
data.frame(range = ranges, frequency = freq$counts)
回答2:
Amend frac
to actually represent your desired intervals, and then use the table
function:
x = runif(100) # For example.
frac = cut(x, breaks = seq(0, 1, 0.1))
table(frac)
Result:
frac
(0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8]
14 9 8 10 8 12 7 7
(0.8,0.9] (0.9,1]
16 9
回答3:
Introduce extremes c(0, 1)
to v
then use the same cut
:
library(dplyr)
#dummy data
set.seed(1)
v <- round(runif(7), 2)
#result
data.frame(v,
vFrac = cut(c(0, 1, v), breaks = 10)[-c(1, 2)]) %>%
group_by(vFrac) %>%
mutate(vFreq = n())
# Source: local data frame [10 x 3]
# Groups: vFrac [8]
#
# v vFrac vFreq
# <dbl> <fctr> <int>
# 1 0.27 (0.2,0.3] 1
# 2 0.37 (0.3,0.4] 1
# 3 0.57 (0.5,0.6] 1
# 4 0.91 (0.9,1] 2
# 5 0.20 (0.1,0.2] 1
# 6 0.90 (0.8,0.9] 1
# 7 0.94 (0.9,1] 2
回答4:
Use findInterval instead of cut:
v<-data.frame(v=runif(100,0,1))
library(plyr)
v$x<-findInterval(v$v,seq(0,1,by=0.1))*0.1
ddply(v, .(x), summarize, n=length(x))
回答5:
frac = seq(0, 1, 0.1)
set.seed(42); v = rnorm(10, 0.5, 0.2)
sapply(1:(length(frac)-1), function(i) sum(frac[i]<v & frac[i+1]>=v))
#[1] 0 0 0 1 3 2 1 1 1 1
来源:https://stackoverflow.com/questions/42541994/counting-values-within-levels