Count specific characters from column associated with dual categories of other column. Do it iteratively based on frequency bins

前端 未结 1 1855
再見小時候
再見小時候 2021-01-15 00:50

I have a huge dataframe df1, whose oversimplified version consists of 3 columns, \"Words\", \"Frequency\" and \"Letters\":

Words           Frequency   Letter         


        
相关标签:
1条回答
  • 2021-01-15 01:52

    You can use cut to bin Frequency, substr to clean Letters, and tidyr::separate_rows to unnest Word. Aggregate with dplyr::count, and you're set:

    library(tidyverse)
    
    df %>% separate_rows(Words) %>% 
        count(Words, 
              Letters = substr(Letters, 1, 1),    # use regex if more than one letter
              Frequency = cut(Frequency, breaks = seq(0, 1, .25)))
    
    ## Source: local data frame [11 x 4]
    ## Groups: Frequency, Words [?]
    ## 
    ##     Frequency  Words Letters     n
    ##        <fctr>  <chr>   <chr> <int>
    ## 1    (0,0.25] flower       a     1
    ## 2    (0,0.25] flower       b     1
    ## 3    (0,0.25] planet       a     1
    ## 4    (0,0.25]   tree       a     1
    ## 5  (0.25,0.5] planet       c     1
    ## 6  (0.25,0.5]   tree       c     1
    ## 7  (0.5,0.75] flower       b     1
    ## 8  (0.5,0.75] planet       b     1
    ## 9  (0.5,0.75]   tree       a     1
    ## 10   (0.75,1] planet       b     1
    ## 11   (0.75,1]   tree       a     1
    
    0 讨论(0)
提交回复
热议问题