I am attempting to find tags(keywords) for a recipe by parsing a long string of text. The text contains the recipe ingredients, directions and a short blurb.
Wha
I'd just do something like this:
from nltk.corpus import stopwords
s=set(stopwords.words('english'))
txt="a long string of text about him and her"
print filter(lambda w: not w in s,txt.split())
which prints
['long', 'string', 'text']
and in terms of complexity should be O(n) in number of words in the string, if you believe the hashed set lookup is O(1).
FWIW, my version of NLTK defines 127 stopwords:
'all', 'just', 'being', 'over', 'both', 'through', 'yourselves', 'its', 'before', 'herself', 'had', 'should', 'to', 'only', 'under', 'ours', 'has', 'do', 'them', 'his', 'very', 'they', 'not', 'during', 'now', 'him', 'nor', 'did', 'this', 'she', 'each', 'further', 'where', 'few', 'because', 'doing', 'some', 'are', 'our', 'ourselves', 'out', 'what', 'for', 'while', 'does', 'above', 'between', 't', 'be', 'we', 'who', 'were', 'here', 'hers', 'by', 'on', 'about', 'of', 'against', 's', 'or', 'own', 'into', 'yourself', 'down', 'your', 'from', 'her', 'their', 'there', 'been', 'whom', 'too', 'themselves', 'was', 'until', 'more', 'himself', 'that', 'but', 'don', 'with', 'than', 'those', 'he', 'me', 'myself', 'these', 'up', 'will', 'below', 'can', 'theirs', 'my', 'and', 'then', 'is', 'am', 'it', 'an', 'as', 'itself', 'at', 'have', 'in', 'any', 'if', 'again', 'no', 'when', 'same', 'how', 'other', 'which', 'you', 'after', 'most', 'such', 'why', 'a', 'off', 'i', 'yours', 'so', 'the', 'having', 'once'
obviously you can provide your own set; I'm in agreement with the comment on your question that it's probably easiest (and fastest) to just provide all the variations you want to eliminate up front, unless you want to eliminate a lot more words than this but then it becomes more a question of spotting interesting ones than eliminating spurious ones.