The Problem: A large static list of strings is provided as A
, A long string is provided as B
, strings in A
are all very short (a keywords
Pack up all the individual words of B
into a new list, consisting of the original string split by ' '
. Then, for each element in B
, test for membership against each element of A
. If you find one (or more), delete it/them from A
, and quit as soon as A
is empty.
It seems like your approach will blaze through 500,000 candidates without an opt-out set in place.