问题
I need to create a corpus from a huge dataframe (about 170.000 rows, but only two columns) to mine some text and group by usernames according to to the search terms. For example I start from a dataframe like this:
username search_term
name_1 "some_text_1"
name_1 "some_text_2"
name_2 "some_text_3"
name_2 "some_text_4"
name_3 "some_text_5"
name_3 "some_text_6"
name_3 "some_text_1"
[...]
name_n "some_text_n-1"
And I want to obtain:
data frame 1
username search_term
name_1 "some_text_1"
name_1 "some_text_2"
data frame 2
username search_term
name_2 "some_text_3"
name_2 "some_text_4"
And so on..
Any idea? I thought to a for loop, but it is too slow, since I need to create about 11000 data frames...
To see how to transform a list into a corpus see: How transform a list into a corpus in r?
回答1:
We can split
the dataset ('df1') into a list
lst <- split(df1, df1$username)
Usually, it is better to stop here and do all the calculations/analysis within the list
itself. But, if we want to create l000's of objects in the global environment, one way is using list2env
after naming the list
elements with the object names we desire.
list2env(setNames(lst, paste0('DataFrame',
seq_along(lst)), envir=.GlobalEnv)
DataFrame1
DataFrame2
Another way of keeping the data would be to nest
it
library(dplyr)
library(tidyr)
df1 %>%
nest(-username)
来源:https://stackoverflow.com/questions/33920330/split-a-huge-dataframe-in-many-smaller-dataframes-to-create-a-corpus-in-r