I have a huge dataframe df1, whose oversimplified version consists of 3 columns, \"Words\", \"Frequency\" and \"Letters\":
Words Frequency Letter
You can use cut
to bin Frequency
, substr
to clean Letters
, and tidyr::separate_rows
to unnest Word
. Aggregate with dplyr::count
, and you're set:
library(tidyverse)
df %>% separate_rows(Words) %>%
count(Words,
Letters = substr(Letters, 1, 1), # use regex if more than one letter
Frequency = cut(Frequency, breaks = seq(0, 1, .25)))
## Source: local data frame [11 x 4]
## Groups: Frequency, Words [?]
##
## Frequency Words Letters n
## <fctr> <chr> <chr> <int>
## 1 (0,0.25] flower a 1
## 2 (0,0.25] flower b 1
## 3 (0,0.25] planet a 1
## 4 (0,0.25] tree a 1
## 5 (0.25,0.5] planet c 1
## 6 (0.25,0.5] tree c 1
## 7 (0.5,0.75] flower b 1
## 8 (0.5,0.75] planet b 1
## 9 (0.5,0.75] tree a 1
## 10 (0.75,1] planet b 1
## 11 (0.75,1] tree a 1