I have two databases. The first one has about 70k rows with 3 columns. the second one has 790k rows with 2 columns. Both databases have a common variable grantee_name
I haven't used foreach before but maybe the variable x is already the individual rows of zz1?
Have you tried:
stringdist_inner_join(x, zz2, by="grantee_name", method="jw", p=0.1, max_dist=0.1, distance_col="distance")
?
If you split (with base::split
or dplyr::group_split
) your uniquegrantees data frame into a list of data frames, then you can call purrr::map
on the list. (map
is pretty much lapply
)
purrr::map(list_of_dfs, ~stringdist_inner_join(., filings, by="grantee_name", method="jw", p=0.1, max_dist=0.1, distance_col="distance"))
Your result will be a list of data frames each fuzzyjoined with filings. You can then call bind_rows
(or you could do map_dfr
) to get all the results in the same data frame again.
See R - Splitting a large dataframe into several smaller dateframes, performing fuzzyjoin on each one and outputting to a single dataframe