问题
I have a numeric vector that I want to convert to five numeric levels. I can get the five levels using cut
dx <- data.frame(x=1:100)
dx$cut <- cut(dx$x,5)
But I am now having problems extracting the lower and upper boundaries of the levels.
So for example
(0.901,20.8] would be 0.901 in dx$min
and 20.8 in dx$max
.
I tried:
dx$min <- pmin(dx$cut)
dx$max <- pmax(dx$cut)
dx
But this does not work.
回答1:
you can try splitting the labels (converted to character
beforehand and modified to suppress the punctuation except ,
and .
) according to the comma and then create 2 columns:
min_max <- unlist(strsplit(gsub("(?![,.])[[:punct:]]", "", as.character(dx$cut), perl=TRUE), ",")) # here, the regex ask to replace every punctuation mark except a . or a , by an empty string
dx$min <- min_max[seq(1, length(min_max), by=2)]
dx$max <- min_max[seq(2, length(min_max), by=2)]
head(dx)
# x cut min max
#1 1 (0.901,20.8] 0.901 20.8
#2 2 (0.901,20.8] 0.901 20.8
#3 3 (0.901,20.8] 0.901 20.8
#4 4 (0.901,20.8] 0.901 20.8
#5 5 (0.901,20.8] 0.901 20.8
#6 6 (0.901,20.8] 0.901 20.8
回答2:
Below is tidyverse style solution.
library(tidyverse)
tibble(x = seq(-1000, 1000, length.out = 10),
x_cut = cut(x, 5)) %>%
mutate(x_tmp = str_sub(x_cut, 2, -2)) %>%
separate(x_tmp, c("min", "max"), sep = ",") %>%
mutate_at(c("min", "max"), as.double)
#> # A tibble: 10 x 4
#> x x_cut min max
#> <dbl> <fct> <dbl> <dbl>
#> 1 -1000 (-1e+03,-600] -1000 -600
#> 2 -778. (-1e+03,-600] -1000 -600
#> 3 -556. (-600,-200] -600 -200
#> 4 -333. (-600,-200] -600 -200
#> 5 -111. (-200,200] -200 200
#> 6 111. (-200,200] -200 200
#> 7 333. (200,600] 200 600
#> 8 556. (200,600] 200 600
#> 9 778. (600,1e+03] 600 1000
#> 10 1000 (600,1e+03] 600 1000
Created on 2019-01-10 by the reprex package (v0.2.1)
来源:https://stackoverflow.com/questions/39387745/cut-extracting-the-minimum-and-maximum-of-cut-levels-as-columns-in-data-frame