I'm looking at the example menu of the command cut()
(example(cut)
), specifically this part:
cut> aaa <- c(1,2,3,4,5,2,3,4,5,6,7)
cut> cut(aaa, 3)
[1] (0.994,3] (0.994,3] (3,5] (3,5] (3,5] (0.994,3]
[7] (3,5] (3,5] (3,5] (5,7.01] (5,7.01]
Levels: (0.994,3] (3,5] (5,7.01]
cut> cut(aaa, 3, dig.lab = 4, ordered = TRUE)
[1] (0.994,2.998] (0.994,2.998] (2.998,5.002] (2.998,5.002]
[5] (2.998,5.002] (0.994,2.998] (2.998,5.002] (2.998,5.002]
[9] (2.998,5.002] (5.002,7.006] (5.002,7.006]
Levels: (0.994,2.998] < (2.998,5.002] < (5.002,7.006]
cut> ## one way to extract the breakpoints
cut> labs <- levels(cut(aaa, 3))
cut> cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
cut+ upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
lower upper
[1,] 0.994 3.00
[2,] 3.000 5.00
[3,] 5.000 7.01
Where the intervals are closed on the right (as shown above), then it shows me a way to extract the breakpoints of the data using cbind()
Now, let's suppose my data will by cut, but indicating that the intervals are closed on the left.
cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
How can I extract now my breakpoints using the same command cbind()
? (If there are more ways, you're welcome)
Just use something like the following for your pattern, and use gsub
instead: "\\[|\\]|\\(|\\)"
.
An example.
out <- levels(cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE))
gsub("\\[|\\]|\\(|\\)", "", out)
# [1] "0.994,2.998" "2.998,5.002" "5.002,7.006"
And, here's a quick way to read that data in:
read.csv(text = gsub("\\[|\\]|\\(|\\)", "", out), header = FALSE)
# V1 V2
# 1 0.994 2.998
# 2 2.998 5.002
# 3 5.002 7.006
FYI: The same pattern would work whether the intervals are closed on the left or on the right. Using your original example:
labs <- levels(cut(aaa, 3))
labs
# [1] "(0.994,3]" "(3,5]" "(5,7.01]"
read.csv(text = gsub("\\[|\\]|\\(|\\)", "", labs), header = FALSE)
# V1 V2
# 1 0.994 3.00
# 2 3.000 5.00
# 3 5.000 7.01
As for alternatives, since you just need to strip out the first and last character before you can use read.csv
, you can also easily use substr
without having to fuss with regular expressions (if that's not your thing):
substr(labs, 2, nchar(labs)-1)
# [1] "0.994,3" "3,5" "5,7.01"
Update: A totally different alternative
Since it is obvious that R has to calculate these values and store them as part of the function in order to generate the output you see, it is not too difficult to manipulate the function to get it to output different things.
Looking at the code for cut.default
, you'll find the following as the last few lines:
if (codes.only)
code
else factor(code, seq_along(labels), labels, ordered = ordered_result)
It's really easy to change the last few lines to output a list
that contains the output of cut
as the first item, and the calculated ranges (from the cut
function directly, not extracting from the pasted together factor
labels
.
For instance, in the Gist I've posted at this link, I've changed those lines as follows:
if (codes.only)
FIN <- code
else FIN <- factor(code, seq_along(labels), labels, ordered = ordered_result)
list(output = FIN, ranges = data.frame(lower = ch.br[-nb], upper = ch.br[-1L]))
Now, compare:
cut(aaa, 3)
# [1] (0.994,3] (0.994,3] (3,5] (3,5] (3,5] (0.994,3] (3,5] (3,5]
# [9] (3,5] (5,7.01] (5,7.01]
# Levels: (0.994,3] (3,5] (5,7.01]
CUT(aaa, 3)
# $output
# [1] (0.994,3] (0.994,3] (3,5] (3,5] (3,5] (0.994,3] (3,5] (3,5]
# [9] (3,5] (5,7.01] (5,7.01]
# Levels: (0.994,3] (3,5] (5,7.01]
#
# $ranges
# lower upper
# 1 0.994 3
# 2 3 5
# 3 5 7.01
And, right = FALSE
:
cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
# [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
# [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
# [11] [5.002,7.006)
# Levels: [0.994,2.998) < [2.998,5.002) < [5.002,7.006)
CUT(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
# $output
# [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
# [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
# [11] [5.002,7.006)
# Levels: [0.994,2.998) < [2.998,5.002) < [5.002,7.006)
# $ranges
# lower upper
# 1 0.994 2.998
# 2 2.998 5.002
# 3 5.002 7.006
来源:https://stackoverflow.com/questions/19689397/extracting-breakpoints-with-intervals-closed-on-the-left