Extracting breakpoints with intervals closed on the left

后端 未结 1 394
失恋的感觉
失恋的感觉 2020-12-21 17:02

I\'m looking at the example menu of the command cut() (example(cut)), specifically this part:

cut> aaa <- c(1,2,3,4,5,2,3,4,5         


        
相关标签:
1条回答
  • 2020-12-21 17:13

    Just use something like the following for your pattern, and use gsub instead: "\\[|\\]|\\(|\\)".

    An example.

    out <- levels(cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE))
    gsub("\\[|\\]|\\(|\\)", "", out)
    # [1] "0.994,2.998" "2.998,5.002" "5.002,7.006"
    

    And, here's a quick way to read that data in:

    read.csv(text = gsub("\\[|\\]|\\(|\\)", "", out), header = FALSE)
    #      V1    V2
    # 1 0.994 2.998
    # 2 2.998 5.002
    # 3 5.002 7.006
    

    FYI: The same pattern would work whether the intervals are closed on the left or on the right. Using your original example:

    labs <- levels(cut(aaa, 3))
    labs
    # [1] "(0.994,3]" "(3,5]"     "(5,7.01]" 
    read.csv(text = gsub("\\[|\\]|\\(|\\)", "", labs), header = FALSE)
    #      V1   V2
    # 1 0.994 3.00
    # 2 3.000 5.00
    # 3 5.000 7.01
    

    As for alternatives, since you just need to strip out the first and last character before you can use read.csv, you can also easily use substr without having to fuss with regular expressions (if that's not your thing):

    substr(labs, 2, nchar(labs)-1)
    # [1] "0.994,3" "3,5"     "5,7.01" 
    

    Update: A totally different alternative

    Since it is obvious that R has to calculate these values and store them as part of the function in order to generate the output you see, it is not too difficult to manipulate the function to get it to output different things.

    Looking at the code for cut.default, you'll find the following as the last few lines:

    if (codes.only) 
        code
    else factor(code, seq_along(labels), labels, ordered = ordered_result)
    

    It's really easy to change the last few lines to output a list that contains the output of cut as the first item, and the calculated ranges (from the cut function directly, not extracting from the pasted together factor labels.

    For instance, in the Gist I've posted at this link, I've changed those lines as follows:

    if (codes.only) 
      FIN <- code
    else FIN <- factor(code, seq_along(labels), labels, ordered = ordered_result)
    list(output = FIN, ranges = data.frame(lower = ch.br[-nb], upper = ch.br[-1L]))
    

    Now, compare:

    cut(aaa, 3)
    #  [1] (0.994,3] (0.994,3] (3,5]     (3,5]     (3,5]     (0.994,3] (3,5]     (3,5]    
    #  [9] (3,5]     (5,7.01]  (5,7.01] 
    # Levels: (0.994,3] (3,5] (5,7.01]
    CUT(aaa, 3)
    # $output
    # [1] (0.994,3] (0.994,3] (3,5]     (3,5]     (3,5]     (0.994,3] (3,5]     (3,5]    
    # [9] (3,5]     (5,7.01]  (5,7.01] 
    # Levels: (0.994,3] (3,5] (5,7.01]
    # 
    # $ranges
    #   lower upper
    # 1 0.994     3
    # 2     3     5
    # 3     5  7.01
    

    And, right = FALSE:

    cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
    #  [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
    #  [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
    # [11] [5.002,7.006)
    # Levels: [0.994,2.998) < [2.998,5.002) < [5.002,7.006)
    CUT(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
    # $output
    #  [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
    #  [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
    # [11] [5.002,7.006)
    # Levels: [0.994,2.998) < [2.998,5.002) < [5.002,7.006)
    
    # $ranges
    #   lower upper
    # 1 0.994 2.998
    # 2 2.998 5.002
    # 3 5.002 7.006
    
    0 讨论(0)
提交回复
热议问题