问题
I know how to find quantile of an empirical distribution.
set.seed(1)
x = rnorm(100)
q = quantile(x, prob=seq(0,1,.01))
Is there a function that would give me the quantile bin a number of the training set belongs to ? In this example
R) x[1]
[1] -0.6264538107
R) q
0% 1% 2% 3% 4% 5% 6% 7% 8%
-2.214699887177 -1.991605177777 -1.808646490230 -1.532008555284 -1.472864960560 -1.381744198182 -1.282620249360 -1.255240516814 -1.226934277726
9% 10% 11% 12% 13% 14% 15% 16% 17%
-1.137935552774 -1.052657473293 -0.946201701058 -0.847444894718 -0.822439213796 -0.754080533415 -0.714945447616 -0.707887360796 -0.691941403160
18% 19% 20% 21% 22% 23% 24% 25% 26%
-0.637668149828 -0.622231094280 -0.613869230709 -0.594247090071 -0.576841631266 -0.569725969545 -0.548795719430 -0.494242549079 -0.474635485293
27% 28% 29% 30% 31% 32% 33% 34% 35%
-0.451421239288 -0.422917810077 -0.400294290491 -0.375342019640 -0.324556644843 -0.304569351961 -0.270133020491 -0.194728544774 -0.158850338047
36% 37% 38% 39% 40% 41% 42% 43% 44%
-0.142600696093 -0.135100488041 -0.120975401008 -0.106515536418 -0.076703128964 -0.057434448974 -0.054780994140 -0.048748324589 -0.041745189497
45% 46% 47% 48% 49% 50% 51% 52% 53%
-0.026562645934 -0.006850631144 0.015360659421 0.052098524774 0.074455390351 0.113909160789 0.168144431357 0.186114832362 0.225596350406
54% 55% 56% 57% 58% 59% 60% 61% 62%
0.278298615355 0.308573926852 0.331022515551 0.336463178904 0.350973845124 0.366811069726 0.377079930574 0.388518545252 0.392983041115
63% 64% 65% 66% 67% 68% 69% 70% 71%
0.405445081905 0.438666028932 0.479681362135 0.510968662152 0.557264863548 0.562081050166 0.571598761948 0.581217342523 0.593914332477
72% 73% 74% 75% 76% 77% 78% 79% 80%
0.598644634069 0.613183189979 0.638003287679 0.691545365689 0.697743441191 0.708979192306 0.743791934661 0.764300755430 0.771253599759
81% 82% 83% 84% 85% 86% 87% 88% 89%
0.789562430661 0.832000770742 0.887545566130 0.922954785861 0.961725754674 1.068269412135 1.103263092985 1.129187521849 1.162347897592
90% 91% 92% 93% 94% 95% 96% 97% 98%
1.181065077514 1.221440863082 1.364627083543 1.435300882891 1.468328439976 1.515533782755 1.587171348445 1.606834375029 1.984244133943
99% 100%
2.174901731264 2.401617760505
it would be quantile 18 (or 19 depending how you see things)
回答1:
I'd use findInterval()
:
findInterval(x,q)
# [1] 19 52 13 97 56 14 66 78 70 32 95 62 20 1 88 44 46 85
# [19] 82 71 84 81 50 2 74 42 36 5 26 64 92 40 61 43 6 29
# [37] 30 41 87 79 35 34 76 67 18 17 59 80 39 83 63 21 58 10
# [55] 93 98 31 11 69 38 101 45 75 48 15 53 3 94 51 99 65 16
# [73] 73 12 8 55 28 47 49 22 24 37 90 4 72 57 86 33 60 54
# [91] 25 91 89 77 96 68 7 23 9 27
回答2:
How about:
as.numeric(cut(x,q))
## [1] 19 52 13 97 56 14 66 78 70 32 95 62 20 NA 88 44 46 85
## [19] 82 71 84 81 50 2 74 42 36 5 26 64 92 40 61 43 6 29
## [37] 30 41 87 79 35 34 76 67 18 17 59 80 39 83 63 21 58 10
## [55] 93 98 31 11 69 38 100 45 75 48 15 53 3 94 51 99 65 16
## [73] 73 12 8 55 28 47 49 22 24 37 90 4 72 57 86 33 60 54
## [91] 25 91 89 77 96 68 7 23 9 27
The minimum value here is recorded as NA
-- you'll need to set include.lowest = TRUE. Default is FALSE.
来源:https://stackoverflow.com/questions/24001788/how-to-find-in-which-quantile-bin-does-a-number-fall