R - cut by breaks and count number of occurrences by group

后端 未结 2 459
走了就别回头了
走了就别回头了 2021-01-26 23:47

I have a data frame that looks like this:

dat <- structure(list(Geocode = c(\"1100015\", \"1100023\", \"1100031\", \"1100049\", 
\"1100056\", \"1100064\", \"1         


        
相关标签:
2条回答
  • 2021-01-27 00:14

    Here is a quick and dirty method, might update this later to make it more clean and avoid having to bind_rows()

    Try the following:

    library(tidyverse)
    
    dat_1 <- dat %>% 
      mutate(population_breaks = case_when(Population <= 50000 ~ "0-50000",
                                           Population >= 50000 & Population <= 100000 ~ "50000-100000",
                                           Population >= 100000 ~ ">100000")) %>% 
      group_by(population_breaks) %>% 
      count(Region)
    
    dat_2 <- dat %>% 
      mutate(population_breaks = case_when(Population <= 50000 ~ "0-50000",
                                           Population >= 50000 & Population <= 100000 ~ "50000-100000",
                                           Population >= 100000 ~ ">100000")) %>% 
      group_by(population_breaks) %>% 
      count(population_breaks) %>% 
      mutate(Region = "All")
    
    bind_rows(dat_1, dat_2)  
    

    Which returns:

    # A tibble: 11 x 3
    # Groups:   population_breaks [3]
       population_breaks   Region     n
                   <chr>    <chr> <int>
     1           0-50000 Nordeste     9
     2      50000-100000 Nordeste     1
     3           >100000    Norte     1
     4           0-50000    Norte     8
     5      50000-100000    Norte     1
     6           >100000      Sul     2
     7           0-50000      Sul     6
     8      50000-100000      Sul     2
     9           >100000      All     3
    10           0-50000      All    23
    11      50000-100000      All     4
    
    0 讨论(0)
  • 2021-01-27 00:34

    By using cut and dplyr

    dat$Class=cut(dat$Population,c(0,50000,100000,Inf),labels=c('0-50000','50000-100000','>100000'))
    library(dplyr)
    d1=dat%>%group_by(Class,Region)%>%summarise(count=n())
    d2=dat%>%group_by(Class)%>%summarise(count=n(),Region='All')
    bind_rows(d1,d2)
    
              Class   Region count
             <fctr>    <chr> <int>
     1      0-50000 Nordeste     9
     2      0-50000    Norte     8
     3      0-50000      Sul     6
     4 50000-100000 Nordeste     1
     5 50000-100000    Norte     1
     6 50000-100000      Sul     2
     7      >100000    Norte     1
     8      >100000      Sul     2
     9      0-50000      All    23
    10 50000-100000      All     4
    11      >100000      All     3
    
    0 讨论(0)
提交回复
热议问题