Split a huge dataframe in many smaller dataframes to create a corpus in r

问题

I need to create a corpus from a huge dataframe (about 170.000 rows, but only two columns) to mine some text and group by usernames according to to the search terms. For example I start from a dataframe like this:

username    search_term
name_1      "some_text_1"
name_1      "some_text_2"
name_2      "some_text_3"
name_2      "some_text_4"
name_3      "some_text_5"
name_3      "some_text_6"
name_3      "some_text_1"

[...]

name_n      "some_text_n-1"

And I want to obtain:

data frame 1
username    search_term
name_1      "some_text_1"
name_1      "some_text_2"

data frame 2
username    search_term
name_2      "some_text_3"
name_2      "some_text_4"

And so on..

Any idea? I thought to a for loop, but it is too slow, since I need to create about 11000 data frames...

To see how to transform a list into a corpus see: How transform a list into a corpus in r?

回答1:

We can split the dataset ('df1') into a list

lst <- split(df1, df1$username)

Usually, it is better to stop here and do all the calculations/analysis within the list itself. But, if we want to create l000's of objects in the global environment, one way is using list2env after naming the list elements with the object names we desire.

list2env(setNames(lst, paste0('DataFrame', 
                 seq_along(lst)), envir=.GlobalEnv)

DataFrame1
DataFrame2

Another way of keeping the data would be to nest it

library(dplyr)
library(tidyr)
df1 %>% 
     nest(-username)

来源：https://stackoverflow.com/questions/33920330/split-a-huge-dataframe-in-many-smaller-dataframes-to-create-a-corpus-in-r

标签

dataframe

corpus

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!