Create categorical variable in R based on range

前端 未结 3 1121
难免孤独
难免孤独 2020-11-27 07:13

I have a dataframe with a column of integers that I would like to use as a reference to make a new categorical variable. I want to divide the variable into three groups and

相关标签:
3条回答
  • 2020-11-27 07:50
    x <- rnorm(100,10,10)
    cut(x,c(-Inf,0,5,6,10,Inf))
    
    0 讨论(0)
  • 2020-11-27 08:03

    Ian's answer (cut) is the most common way to do this, as far as i know.

    I prefer to use shingle, from the Lattice Package

    the argument that specifies the binning intervals seems a little more intuitive to me.

    you use shingle like so:

    # mock some data
    data = sample(0:40, 200, replace=T)
    
    a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)
    
    my_bins = matrix(rbind(a, b, c, d, e), ncol=2)
    
    # returns: (the binning intervals i've set)
            [,1] [,2]
     [1,]    0    5
     [2,]    5    9
     [3,]    9   19
     [4,]   19   33
     [5,]   33   41
    
    shx = shingle(data, intervals=my_bins)
    
    #'shx' at the interactive prompt will give you a nice frequency table:
    # Intervals:
       min max count
    1   0   5    23
    2   5   9    17
    3   9  19    56
    4  19  33    76
    5  33  41    46
    
    0 讨论(0)
  • 2020-11-27 08:10

    We can use smart_cut from package cutr:

    devtools::install_github("moodymudskipper/cutr")
    library(cutr)
    
    x <- c(3,4,6,12)
    

    To cut with intervals of length 5 starting on 1 :

    smart_cut(x,list(5,1),"width" , simplify=FALSE)
    # [1] [1,6)   [1,6)   [6,11)  [11,16]
    # Levels: [1,6) < [6,11) < [11,16]
    

    To get exactly your requested output :

    smart_cut(x,c(0,6,11,16), labels = ~paste0(.y[1],'-',.y[2]-1), simplify=FALSE, open_end = TRUE)
    # [1]   0-5   0-5  6-10 11-15
    # Levels:   0-5 <  6-10 < 11-15
    

    more on cutr and smart_cut

    0 讨论(0)
提交回复
热议问题