Speeding up the processing of large data frames in R
问题 Context I have been trying to implement the algorithm recently proposed in this paper. Given a large amount of text (corpus), the algorithm is supposed to return characteristic n -grams (i.e., sequence of n words) of the corpus. The user can decide the appropriate n , and at the moment I am trying with n = 2-6 as in the original paper. In other words, using the algorithm, I want to extract 2- to 6-grams that characterize the corpus. I was able to implement the part that calculates the score